Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: June 30, 2024
Indexes in SQL are critical for enhancing the performance and efficiency of database queries. They provide a structured way to quickly retrieve data without scanning the entire table.
There are two primary types of indexes: clustered and non-clustered. Understanding their differences and use cases can significantly impact database performance and query optimization.
In this tutorial, we’ll explore clustered and non-clustered indexes, discussing their characteristics, practical examples, and key differences. We’ll conclude with a comparative analysis to highlight their distinct features.
A clustered index determines the physical order of data in a table. It sorts and stores the data rows in the table based on the index key. Furthermore, each table can only have one clustered index because it can sort data rows in only one order. We often refer to a table with a clustered index as a clustered table.
Let’s examine some of the characteristics of a clustered index:
Here, we’ll consider the Student table in the Baeldung University database schema. Let’s create a clustered index on the id column:
CREATE TABLE Student (
id INT PRIMARY KEY NOT NULL, -- clustered index
name VARCHAR(60),
national_id BIGINT NOT NULL,
birth_date DATE,
enrollment_date DATE,
graduation_date DATE,
UNIQUE (id)
);
By creating the table with the id column as the primary key, we automatically create a clustered index on the id column. This clustered index maintains the order of the records based on the id column.
Next, let’s perform a SELECT * query to observe the physical order of the data:
SELECT * FROM Student LIMIT 5;
+------+-----------------+-------------+------------+-----------------+-----------------+
| id | name | national_id | birth_date | enrollment_date | graduation_date |
|------+-----------------+-------------+------------+-----------------+-----------------|
| 1001 | John Liu | 123345566 | 2001-04-05 | 2020-01-15 | 2024-06-15 |
| 1003 | Rita Ora | 132345166 | 2001-01-14 | 2020-01-15 | 2024-06-15 |
| 1007 | Philip Lose | 321345566 | 2001-06-15 | 2020-01-15 | 2024-06-15 |
| 1010 | Samantha Prabhu | 3217165566 | 2001-03-21 | 2020-01-15 | 2024-06-15 |
| 1011 | Vikas Jain | 321345662 | 2001-07-18 | 2020-01-15 | <null> |
+------+-----------------+-------------+------------+-----------------+-----------------+
SELECT 5
Time: 0.022s
Notably, the data is in ascending order based on the id column. This change in the physical order of data occurs because the clustered index maintains the order of the records based on the id column.
A non-clustered index, on the other hand, doesn’t alter the physical order of the data. Instead, it creates a separate data structure that contains sorted values of the indexed column(s) along with pointers to the corresponding rows in the original table. A table can have multiple non-clustered indexes, each providing a different way to look up data.
Let’s review some characteristics of a non-clustered index:
Let’s use the Student table again, but this time, let’s create a non-clustered index on the name column:
CREATE INDEX idx_student_name ON Student(name);
This index allows for faster retrieval of records based on the name column without affecting the physical order of the data in the table.
Next, let’s perform a query that utilizes the non-clustered index:
SELECT * FROM Student WHERE name = 'Rita Ora';
+------+----------+-------------+------------+-----------------+-----------------+
| id | name | national_id | birth_date | enrollment_date | graduation_date |
|------+----------+-------------+------------+-----------------+-----------------|
| 1003 | Rita Ora | 132345166 | 2001-01-14 | 2020-01-15 | 2024-06-15 |
+------+----------+-------------+------------+-----------------+-----------------+
SELECT 1
Time: 0.010s
This query uses the non-clustered index idx_student_name to quickly find the record for Rita Ora without scanning the entire table.
Let’s look at the key differences between clustered and non-clustered indexes:
| Feature | Clustered Index | Non-Clustered Index |
|---|---|---|
| Definition | Determines the physical order of data in a table | Creates a separate structure pointing to the data rows |
| Number per Table | Only one per table | Multiple indexes allowed per table |
| Speed for Range Queries | Faster, as data is stored in sorted order | Slower, as it requires additional lookups |
| Use Case | Optimal for columns frequently used in range queries and sorting operations | Ideal for columns frequently used in search conditions but not sorted |
| Performance | Faster retrieval of data | Slower retrieval compared to clustered indexes |
| Memory Usage | Requires less memory for operations | Requires more memory for operations |
| Data Structure | Stores and sorts the data rows | Stores indexed column values and pointers to actual data |
| Storage of Pointers | Stores actual data at leaf level | Stores pointers to data rows |
| Order of Data | Clustered key defines the order of data within a table | Index key defines the order of data within the index |
| Size | Larger primary clustered index | Smaller non-clustered index |
| Example | PRIMARY KEY(id) | CREATE INDEX idx_student_name ON Student(name) |
In this article, we’ve explored the fundamental differences between clustered and non-clustered indexes in SQL. Understanding these differences is crucial for optimizing database performance and making informed decisions about index implementation.
By leveraging the right type of index, we can significantly enhance the efficiency of our database queries and overall data management.