In this tutorial, we’ll explore the differences between the UUID and Sequential ID as primary keys.
When designing a database, choosing the appropriate primary key format is critical to the system's performance, scalability, and data integrity.
Tables inside the database must have a primary key column that is both unique and not nullable. That way, a primary key value can uniquely identify each row.
One of the primary decisions in selecting a primary key is whether to use a UUID or a sequential ID. While both approaches have their benefits and drawbacks, the best option depends on the specific use case and the goals of the system.
Defined by the RFC 4122 standard, the UUID (Universally Unique IDentifier) represents a 128 bits value.
Nowadays, most relational databases support UUID type:
- Oracle – the RAW(16) type
- SQL Server – the NEWID() function
- PostgreSQL – the UUID type
- MySQL – the BINARY(16) type or the UUID() function
However, if the database doesn’t support such a type, we should define it as a BINARY(16) type. Using a different type (i.e., CHAR(32) would work as well, but it would acquire additional and unnecessary space to store the value).
Now, let’s go through the advantages and disadvantages of using UUID as the primary key.
2.1. Distributed Systems
UUID can come in handy when working with distributed systems that share databases.
The main purpose of having UUID as the primary key is the ability to share the data across distributed systems.
With the UUID as the primary key, we can guarantee the collision won’t happen, and we don’t have to create a central coordination system for managing the uniqueness of the primary keys.
Additionally, it reduces the complexity of integration between databases.
Distributed and NoSQL databases rely on UUID as keys (i.e., MongoDB or CouchDB) rather than numeric values.
UUIDs are globally unique. In other words, we can identify each record in the database with an ID that is unique across tables, databases, and systems. The latter is especially important in distributed systems where we may add or remove nodes dynamically, and coordination between them can be challenging. The collision is only theoretically possible.
Furthermore, they provide additional security since the next value is hard to predict. Therefore, it makes it almost impossible for a malicious user to guess the ID. On the other hand, such a user can easily guess the next value of sequential IDs.
Moreover, they do not expose information about business data, so we can safely use them as part of the URL path.
2.3. Generating the Value
Another advantage of the UUID is that it can be generated by the application or the database system itself.
Usually, when having the sequential ID, we rely on the database system to generate and increment the primary key value instead of us. Otherwise, keeping track of the next value we should use would be complicated.
However, giving up the responsibility of generating sequential primary keys comes with a downside. We can get the actual value of the new record only after executing the insert statement.
On the contrary, we can generate the UUID in the code by ourselves instead of telling the database to generate it on our behalf. Since the UUID is unique and isn’t sequential, we don’t have to worry about the previous values.
Therefore, we can have the primary key value immediately. We don’t need to wait until the insert query is executed.
2.4. Memory Usage
The UUID is 128-bit long, which is double the size of a BIGINT and four times the size of the INTEGER types.
In relational databases, we usually use a BIGINT to store numerical identifiers, and having UUID instead may not make a big difference since it is only twice bigger.
However, having a big primary key may result in performance issues, especially when it comes to select queries and indexing.
The UUID consists of 32 hex digits separated by four dashes, which makes it quite challenging to remember.
Therefore, the representation of a UUID may not be considered user-friendly and might be hard to verbalize. It isn’t easy to read and memorize.
On the other hand, sequential IDs are easy to read and remember.
However, it’s questionable whether the ID value itself is meant for humans to read or not.
Another disadvantage is the fact that we can’t sort the UUID values in the natural order.
Because of this limitation, we might be forced to use another column, for instance, creational timestamp, to get ordered items. Consequently, this might increase the execution time of a query.
3. Sequential ID
A sequence is a unique alphanumeric value that identifies a sequence record in a database.
Furthermore, we can only use numerical identifiers in the sequence since the system can’t determine the next sequence value for the UUID type.
Sequential IDs are usually preferred over UUIDs since they require less space.
Database engines provide native support for sequential ID:
- PostgreSQL – SERIAL
- SQL Server – IDENTITY
- MySQL – AUTO_INCREMENT
- SQLite – AUTOINCREMENT
Let’s see the advantages and disadvantages of using the sequential ID as the primary key.
Unlike UUID values, numerical identifiers are simpler to read and remember.
With numeric identifiers, we can easily track the order in which the records are inserted into the database.
Additionally, we can quickly identify the relationship between records based on their IDs.
We often use indexes on primary and foreign keys to speed up select queries and joins.
Some databases use the B+Tree structure for indexing, while others, like SQL and MySQL, use clustered indexes instead.
Furthermore, UUID values don’t index well. The longer the key, the more memory the index entry will require.
Moreover, UUIDs have a low indexing factor since their values are random. Every time we modify a table, indexes would have to update. This can impact the performance of our system and use unnecessary memory.
Additionally, we could index foreign keys on joined tables as well, which would take another performance hit when it comes to UUID values.
3.3. Batch Actions
Batch actions refer to the process of performing multiple database operations as a single unit of work.
One of the reasons why batch actions work better with sequential IDs is that sequential IDs are generated in a predictable sequence. Multiple database operations can be performed as a single batch, optimizing the performance of the system.
The primary key values are generated in a predictable order, and new records are added to the end of the sequence. This makes it possible to use certain types of queries, such as range queries, which require the records to be sorted by their primary key.
In addition, sequential IDs are generally smaller than UUIDs, which can improve the system's performance by reducing the amount of storage required.
However, it's important to consider the potential for gaps in the sequence when using sequential IDs in distributed systems. In this scenario, having the UUID as the primary key would be a better option.
Sequence identifiers follow a specific structure which makes them predictable.
This may allow malicious users to read the information they shouldn’t.
We may end up unintentionally exposing some private data and business logic. For example, the last ID could represent the total number of invoices generated by a user, which can reveal the income information.
As mentioned earlier, it would be complicated to generate the sequential ID by the application. To create the ID, we’d need to consult the database to figure out the next available value.
In distributed systems, where more than one system is inserting the data into the database, this often means we might end up having collisions in our data. It’s likely the different systems would produce the same key value.
To overcome this problem, we’d need to create separate services to produce sequential value. Additionally, the same service would become the single point of failure.
3.6. Size Limit
Lastly, sequential IDs limit their size. Although the maximum number of numerical identifiers is quite big, running out of numbers is possible.
If we are using INT type as the primary key, eventually, we can reach the maximum number (2,147,483,647), which will cause silent overflow errors. Therefore, we could end up having negative values as primary keys.
4. Difference Between UUID and Sequential ID
To sum up, the table below shows the differences between UUID and Sequential ID.
|Takes 128 bits
||Takes 64 bits when it comes to BIGINT and 32 bits for INT type
|Collision is only possible in theory
||Collision is possible due to a size limitation
|Works well in distributed systems
||Needs to have a coordination component to avoid duplicate values
|Well supported by NoSQL and distributed databases
||It's not recommended to use it within NoSQL and distributed databases
|Difficult to remember and vocalize
||Easy to read and remember
In this tutorial, we learned the differences between UUID and Sequential IDs as primary keys.
In conclusion, whether to use a UUID or sequential ID as a primary key depends on the system's specific use case and goals
If we are working with distributed systems and we require global uniqueness, a UUID may be the best choice. On the other hand, a sequential ID might be best suited if we have limited memory usage and the performance of executing SQL queries is crucial.
Ultimately, the choice of the primary key should be made based on a careful evaluation of the system's requirements and constraints.
Course – LSD (cat=Persistence) res – Persistence (eBook) (cat=Persistence)