Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: November 7, 2024
In this tutorial, we’ll explore the differences between individual and composite database indexes. Firstly, we’ll look at how to create multiple individual secondary indexes for a table and how they can enhance the performance of specific queries.
Next, we’ll learn about composite indexes and how they enhance queries that involve multiple columns in the WHERE or ORDER BY clauses. After that, we’ll talk about mixing individual and composite indexes and see how it can lead to creating redundant indexes.
For the code examples in this article, we’ll use the Baeldung University database.
Additionally, we’ll need a larger data sample to demonstrate the usage of the indexes. Let’s create a simple procedure that generates 10,000 new entries and inserts them into the Student table:
DELIMITER $$
CREATE PROCEDURE insert_students()
BEGIN
DECLARE i INT DEFAULT 3000;
WHILE i <= 13000 DO
INSERT INTO Student (id, name, national_id, birth_date, enrollment_date, graduation_date, gpa)
VALUES (
i,
CONCAT('Student', i),
FLOOR(RAND() * 10000000000),
DATE_ADD('1990-01-01', INTERVAL FLOOR(RAND() * 10000) DAY),
DATE_ADD('2010-01-01', INTERVAL FLOOR(RAND() * 3650) DAY),
DATE_ADD('2020-01-01', INTERVAL FLOOR(RAND() * 3650) DAY),
ROUND(RAND() * 4, 2)
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
After that, we simply need to call this procedure:
CALL insert_students();
Database indexing is a technique for optimizing query performance. Indexes are data structures that enable faster retrieval of rows from a table based on specific key columns. Besides the primary key, any additional indexes created on a table are referred to as secondary indexes.
For example, let’s consider we have a Student table and want to find all the students who enrolled in the past four years and have a grade point average higher than 3.0:
mysql> SELECT * FROM Student WHERE gpa > 3 AND enrollment_date > (NOW() - INTERVAL 4 YEAR);
+------+----------------+-------------+------------+-----------------+-----------------+------+
| id | name | national_id | birth_date | enrollment_date | graduation_date | gpa |
+------+----------------+-------------+------------+-----------------+-----------------+------+
| 1617 | Philip Mohan | 3103455662 | 2002-05-15 | 2021-01-15 | 2025-06-15 | 5 |
| 1619 | Sam Roberts | 9203455662 | 2002-06-21 | 2021-01-15 | 2025-06-15 | 4.3 |
| 1710 | Roni Roto | 2678910631 | 2002-03-11 | 2021-01-15 | 2025-06-15 | 4.44 |
| 1721 | Vini Puh | 1312091343 | 2002-05-13 | 2021-01-15 | 2025-06-15 | 3.64 |
| 2001 | Reena Roy | 9023455613 | 2003-01-13 | 2022-01-15 | 2025-06-15 | 4.15 |
| 2006 | Agatha Christi | 1100245767 | 2003-05-19 | 2022-01-15 | NULL | 4.59 |
| 2008 | Julia Roberts | 1212446677 | 2003-06-12 | 2022-01-15 | 2025-06-15 | 3.04 |
| 2009 | Pollards Grey | 6503453662 | 2003-04-19 | 2022-01-15 | 2025-06-15 | 4.65 |
+------+----------------+-------------+------------+-----------------+-----------------+------+
Additionally, we can add indexes to improve the query performance. For example, we might be tempted to add two individual indexes for the enrollment_date and gpa columns:
CREATE INDEX idx_gpa ON Student (gpa);
CREATE INDEX idx_enrollment ON Student(enrollment_date);
However, if we re-run the query and examine the execution plan, we’ll notice that the DBMS uses only one index:
mysql> EXPLAIN ANALYZE SELECT * FROM Student WHERE gpa > 3 AND enrollment_date > (NOW() - INTERVAL 4 YEAR);
|
| -> Filter: (Student.gpa > 3) (cost=7.01 rows=4.07) (actual time=0.0935..0.102 rows=8 loops=1)
| -> Index range scan on Student using idx_enrollment over ('2020-11-03' < enrollment_date), with index condition: (Student.enrollment_date > <cache>((now() - interval 4 year))) (cost=7.01 rows=15) (actual time=0.0756..0.0827 rows=15 loops=1)
When choosing between the two, the database will favor the index with a higher cardinality. In other words, it uses idx_gpa because the gpa column has more distinct values for this example.
While the additional index on the enrollment_date column may improve performance for other queries, it does not enhance performance for our specific use case of filtering by both columns.
Multi-column indexes—also known as composite indexes—are created on multiple columns of a table to improve performance for queries that use several columns in the WHERE or ORDER BY clauses. For instance, we can use a composite index on the gpa and enrollment_date columns to further optimize our query:
CREATE INDEX idx_gpa_and_enrollment ON Student (gpa, enrollment_date);
If we re-run the query now and examine the execution plan, we’ll notice that the composed index has been used this time:
mysql> EXPLAIN ANALYZE SELECT * FROM Student WHERE gpa > 3 AND enrollment_date > (NOW() - INTERVAL 4 YEAR);
|
| -> Index range scan on Student using idx_gpa_and_enrollment over (3 < gpa), with index condition: ((Student.gpa > 3) and (Student.enrollment_date > <cache>((now() - interval 4 year)))) (cost=107 rows=278) (actual time=0.105..0.109 rows=8 loops=1)
Moreover, the order of columns in an index is crucial for determining how effectively it can be used. If the order doesn’t match the query’s needs, the index may be used sub-optimally, leading to decreased performance.
In terms of resource usage, multiple individual indexes require separate storage for each one. In contrast, a multi-column index combines several columns into a single structure, making it more space-efficient than having many individual indexes.
Another difference is that individual indexes can optimize specific queries well but may not work together efficiently. On the other hand, multi-column indexes can optimize queries that filter on several columns at once, allowing for a single index lookup, which reduces CPU usage during query execution.
Lastly, mixing individual and composite indexes can lead to redundant indexes, which are duplicate or overlapping indexes on a table. For example, if we have a composite index for the gpa and enrollment_date columns, an individual index for the gpa column becomes redundant. This is happening because the DBMS can use the composite index instead:
CREATE INDEX idx_gpa_and_enrollment ON Student (gpa, enrollemnt_date);
CREATE INDEX idx_gpa ON Student (gpa); -- redundant
CREATE INDEX idx_enrollment ON Student (enrollemnt_date); -- NOT redundant
On the other hand, an individual index for enrollemnt_date will not be redundant and will improve the performance of queries using this column. Creating redundant indexes in a database can have a couple of negative implications such as wasting storage, slowing down write operations, and degrading the overall performance.
In this tutorial, we explored database indexes, focusing on the differences between individual and composite indexes. We found that while individual indexes can optimize specific queries, they require maintenance for each index created.
After that, we learned that composite indexes use a single data structure for multiple columns, making overlapping individual indexes redundant. We also discovered that the order of columns in a composite index should match the order of conditions in the WHERE clause to ensure efficient usage of the index.