Natural vs. Surrogate Keys in Database Baeldung on SQL

1. Overview

The concept of keys is fundamental in database design. Some questions arise: can we pinpoint data uniquely and efficiently in databases? What’s the best way to identify records: using inherent attributes or artificial identifiers? Keys notably serve as unique identifiers for records, enabling efficient data retrieval and maintaining relationships between tables. However, database managers often find themselves at a crossroads when choosing between natural and surrogate keys.

In this tutorial, we’ll explore the concepts of natural and surrogate keys and their ideal use cases. This will further provide us with the knowledge to make informed decisions about choosing one over the other.

2. What Exactly Are Database Keys?

Let’s first clarify what we mean by database keys. Essentially, a key is a special attribute (or set of attributes) in a table that singles out each record – each one should be distinct.

Therefore, keys fulfill several vital functions:

allow each record to be distinctly identified and retrieved
facilitate relationships between tables
contribute to maintaining data integrity and consistency

Next, let’s compare natural keys and surrogate keys.

3. Natural Keys: the Inherent Identifiers

Natural keys are attributes or combinations of attributes that already exist in the data and can uniquely pinpoint a record. They’re called “natural” simply because they arise from the inherent properties of the entity we’re modeling.

For instance, in a table of countries, the country code (such as US for United States) could serve as a natural key. Additionally, in a table of books, the ISBN might be a perfect natural key.

Therefore, let’s see how we might define a Student table using the student’s national ID number as the natural key:

CREATE TABLE Student (
    national_id BIGINT PRIMARY KEY,
    name VARCHAR(60),
    birth_date DATE,
    enrollment_date DATE,
    graduation_date DATE,
    gpa FLOAT
);

In this scenario, we use the national_id as the primary key. It’s also an intrinsic part of the student’s data and distinguishes each student from others.

Natural keys come with several advantages:

carry inherent meaning, making them self-documenting
eliminate the need for an additional column just for identification
often reinforce business rules at the database level

However, they also present some challenges:

can be subject to change (what if a student’s national ID is updated?)
might not always be as distinctive as we initially assume
can be lengthy and cumbersome, potentially affecting performance

As an example to show the limitation of natural key, let’s assume we’re designing a database for a multinational corporation. Naturally, we might be tempted to use employee ID numbers as natural keys. However, if the company merges with another firm, we could end up with duplicate employee IDs, throwing the entire system into disarray.

4. Surrogate Keys: the Artificial Alternative

Surrogate keys, conversely, are artificial identifiers, typically auto-generated numbers or globally unique identifiers, created solely to uniquely identify records. They have no inherent meaning in the real sense. They are “surrogates” in the sense that they stand in for natural keys, often providing a simpler yet more stable way to identify records.

Here’s how we might redefine the Student table with a surrogate key:

CREATE TABLE Student (
    id INT AUTO_INCREMENT PRIMARY KEY,
    national_id BIGINT UNIQUE,
    name VARCHAR(60),
    birth_date DATE,
    enrollment_date DATE,
    graduation_date DATE,
    gpa FLOAT
);

In this case, we use an auto-incrementing id as the primary key while still maintaining the national_id as a unique identifier.

Surrogate keys also provide several benefits:

straightforward and stable
remain constant, even if the other attributes change
typically small integers, which can enhance performance
don’t disclose any information about the data they identify

But they’re not without drawbacks:

lack inherent meaning
necessitate extra storage space
can make the data less immediately comprehensible

To illustrate, let’s return to the multinational corporation example. If we had used surrogate keys (say, an auto-incrementing integer) for employee IDs, the merger wouldn’t pose any problems for the database structure. We could simply continue the sequence for new employees from the merged company.

Now, let’s hold on for a second. We might be losing something by abstracting away the meaningful, natural identifier. So, how do we choose between natural and surrogate keys?

5. Choosing Between Natural and Surrogate Keys

As with many aspects of database design, the answer is context-dependent. Thus, we could benefit from some key considerations:

Performance: Surrogate keys often have an edge here. They’re typically more compact and straightforward, which can accelerate joins and indexing. For instance, joining tables on a 4-byte integer surrogate key is generally faster than joining on a 50-character string natural key.
Data integrity: Natural keys can enforce business rules at the database level. If the natural key is truly immutable, it can be a powerful tool for maintaining data integrity. For example, using an ISBN as a natural key for books ensures that each book is uniquely identified according to a standardized system.
Flexibility: Surrogate keys are better here. They’re easier to modify if business requirements change. If we’re using a natural key and its structure changes, we might be in for a world of pain.
Maintenance: Again, surrogate keys often come out on top. They’re simpler to manage and less likely to require changes. We’ll need to update foreign key references across the database if a natural key value changes.

Therefore, for the Student table in the University database, a surrogate key might be preferable. National IDs might not work for international students and they could potentially change. Thus, a simple auto-incrementing integer would serve well here.

In addition, a natural key like the course code could work effectively, assuming it’s unique and stable. For instance, CS101 for an introductory computer science course is unlikely to change and carries meaning.

Also, a surrogate key might be simpler for the Department table, especially if department codes might change over time.

6. Conclusion

In this article, we explored the differences between natural keys and surrogate keys. We looked at their strengths and weaknesses and learned how to choose between the two.

In the end, there’s no universal best choice between both types of keys. They both have their specific use cases and situations where it’ll be best to use them to avoid future issues with database management.

Full Archive

About Baeldung