
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: September 10, 2024
The concept of keys is fundamental in database design. Some questions arise: can we pinpoint data uniquely and efficiently in databases? What’s the best way to identify records: using inherent attributes or artificial identifiers? Keys notably serve as unique identifiers for records, enabling efficient data retrieval and maintaining relationships between tables. However, database managers often find themselves at a crossroads when choosing between natural and surrogate keys.
In this tutorial, we’ll explore the concepts of natural and surrogate keys and their ideal use cases. This will further provide us with the knowledge to make informed decisions about choosing one over the other.
Let’s first clarify what we mean by database keys. Essentially, a key is a special attribute (or set of attributes) in a table that singles out each record – each one should be distinct.
Therefore, keys fulfill several vital functions:
Next, let’s compare natural keys and surrogate keys.
Natural keys are attributes or combinations of attributes that already exist in the data and can uniquely pinpoint a record. They’re called “natural” simply because they arise from the inherent properties of the entity we’re modeling.
For instance, in a table of countries, the country code (such as US for United States) could serve as a natural key. Additionally, in a table of books, the ISBN might be a perfect natural key.
Therefore, let’s see how we might define a Student table using the student’s national ID number as the natural key:
CREATE TABLE Student (
national_id BIGINT PRIMARY KEY,
name VARCHAR(60),
birth_date DATE,
enrollment_date DATE,
graduation_date DATE,
gpa FLOAT
);
In this scenario, we use the national_id as the primary key. It’s also an intrinsic part of the student’s data and distinguishes each student from others.
Natural keys come with several advantages:
However, they also present some challenges:
As an example to show the limitation of natural key, let’s assume we’re designing a database for a multinational corporation. Naturally, we might be tempted to use employee ID numbers as natural keys. However, if the company merges with another firm, we could end up with duplicate employee IDs, throwing the entire system into disarray.
Surrogate keys, conversely, are artificial identifiers, typically auto-generated numbers or globally unique identifiers, created solely to uniquely identify records. They have no inherent meaning in the real sense. They are “surrogates” in the sense that they stand in for natural keys, often providing a simpler yet more stable way to identify records.
Here’s how we might redefine the Student table with a surrogate key:
CREATE TABLE Student (
id INT AUTO_INCREMENT PRIMARY KEY,
national_id BIGINT UNIQUE,
name VARCHAR(60),
birth_date DATE,
enrollment_date DATE,
graduation_date DATE,
gpa FLOAT
);
In this case, we use an auto-incrementing id as the primary key while still maintaining the national_id as a unique identifier.
Surrogate keys also provide several benefits:
But they’re not without drawbacks:
To illustrate, let’s return to the multinational corporation example. If we had used surrogate keys (say, an auto-incrementing integer) for employee IDs, the merger wouldn’t pose any problems for the database structure. We could simply continue the sequence for new employees from the merged company.
Now, let’s hold on for a second. We might be losing something by abstracting away the meaningful, natural identifier. So, how do we choose between natural and surrogate keys?
As with many aspects of database design, the answer is context-dependent. Thus, we could benefit from some key considerations:
Therefore, for the Student table in the University database, a surrogate key might be preferable. National IDs might not work for international students and they could potentially change. Thus, a simple auto-incrementing integer would serve well here.
In addition, a natural key like the course code could work effectively, assuming it’s unique and stable. For instance, CS101 for an introductory computer science course is unlikely to change and carries meaning.
Also, a surrogate key might be simpler for the Department table, especially if department codes might change over time.
In this article, we explored the differences between natural keys and surrogate keys. We looked at their strengths and weaknesses and learned how to choose between the two.
In the end, there’s no universal best choice between both types of keys. They both have their specific use cases and situations where it’ll be best to use them to avoid future issues with database management.