Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

Merging two rows in SQL involves combining data from two records into a single record. In particular, this is often necessary for data consolidation and cleanup.

In this article, we’ll use the Student table from the Baeldung University database to illustrate different methods for merging two rows.

2. Updating With Coalesce

One method to merge two rows is by using the UPDATE statement combined with the COALESCE function. The COALESCE function returns the first non-null value in its list of arguments, making it particularly useful for merging data where some fields may be NULL.

For example, let’s consider an example of merging Vikas Jain (id=1011) and Ritu Raj (id=1610). First, let’s display the data we want to merge:

SELECT * FROM Student WHERE id IN (1011, 1610);
+------+------------+-------------+------------+-----------------+-----------------+--------+
| id   | name       | national_id | birth_date | enrollment_date | graduation_date | gpa    |
|------+------------+-------------+------------+-----------------+-----------------+--------|
| 1011 | Vikas Jain | 321345662   | 2001-07-18 | 2020-01-15      | <null>          | 3.3    |
| 1610 | Ritu Raj   | 3203455662  | 2002-02-05 | 2021-01-15      | 2025-06-15      | <null> |
+------+------------+-------------+------------+-----------------+-----------------+--------+
SELECT 2
Time: 0.012s

The query shows the data for Vikas Jain and Ritu Raj before the merge.

Next, we use the UPDATE statement combined with the COALESCE function to merge the rows:

UPDATE Student AS target
SET name = COALESCE(target.name, source.name),
    national_id = COALESCE(target.national_id, source.national_id),
    birth_date = COALESCE(target.birth_date, source.birth_date),
    enrollment_date = COALESCE(target.enrollment_date, source.enrollment_date),
    graduation_date = COALESCE(target.graduation_date, source.graduation_date),
    gpa = COALESCE(target.gpa, source.gpa)
FROM Student AS source
WHERE target.id = 1011 AND source.id = 1610;
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
UPDATE 1
Time: 0.008s

In this scenario, the target is the row to keep (id = 1011), and the source is the row to merge into the target (id = 1610). The COALESCE function ensures that the first non-null value is retained.

Furthermore, after merging the data, we delete the source row to ensure there are no duplicates:

DELETE FROM Student WHERE id = 1610;
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
DELETE 1
Time: 0.002s

Finally, let’s display the data to verify the merge:

SELECT * FROM Student WHERE id IN (1011);
+------+------------+-------------+------------+-----------------+-----------------+-----+
| id   | name       | national_id | birth_date | enrollment_date | graduation_date | gpa |
|------+------------+-------------+------------+-----------------+-----------------+-----|
| 1011 | Vikas Jain | 321345662   | 2001-07-18 | 2020-01-15      | 2025-06-15      | 3.3 |
+------+------------+-------------+------------+-----------------+-----------------+-----+
SELECT 1
Time: 0.005s

The result shows that the data has been merged, and the corresponding values from the source row have replaced the NULL values. Vikas Jain’s record now includes the graduation date from Ritu Raj’s record

3. Using INSERT INTO and DELETE

Another method to merge two rows is to create a new row by combining data from the two rows, and then, delete the original rows to ensure there are no duplicates. Furthermore, this approach ensures that a new record is created without directly modifying the existing rows.

For example, let’s create a new merged row by combining data from Param Mohan (id=1717) and Siren Lobo (id=1719):

INSERT INTO Student (id, name, national_id, birth_date, enrollment_date, graduation_date, gpa)
SELECT 
    1000,
    COALESCE(s1.name, s2.name),
    COALESCE(s1.national_id, s2.national_id),
    COALESCE(s1.birth_date, s2.birth_date),
    COALESCE(s1.enrollment_date, s2.enrollment_date),
    COALESCE(s1.graduation_date, s2.graduation_date),
    COALESCE(s1.gpa, s2.gpa)
FROM 
    (SELECT * FROM Student WHERE id = 1717) s1,
    (SELECT * FROM Student WHERE id = 1719) s2;
INSERT 0 1
Time: 0.006s

This query will insert a new row into the Student table, combining the data from both rows.

Next, let’s proceed to delete the original rows to remove redundancy:

DELETE FROM Student WHERE id IN (1717, 1719);
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
DELETE 2
Time: 0.004s

Finally, let’s verify the merge for the newly created row:

SELECT * FROM Student WHERE id = 1000;
+------+-------------+-------------+------------+-----------------+-----------------+------+
| id   | name        | national_id | birth_date | enrollment_date | graduation_date | gpa  |
|------+-------------+-------------+------------+-----------------+-----------------+------|
| 1000 | Param Mohan | 1023456545  | 2002-05-15 | 2021-01-15      | 2025-06-15      | 2.75 |
+------+-------------+-------------+------------+-----------------+-----------------+------+
SELECT 1
Time: 0.008s

We see the data merged successfully with the NULL values replaced.

4. Common Table Expressions (CTEs)

Using Common Table Expressions (CTEs) can simplify the merging process. In particular, it provides a clear structure for selecting and updating rows.

For example, let’s merge the rows for Potu Singh (id=2017) and Julia Roberts (id=2008):

WITH MergedStudent AS (
    SELECT 
        2017 AS id,
        COALESCE(MAX(name), MIN(name)) AS name,
        COALESCE(MAX(national_id), MIN(national_id)) AS national_id,
        COALESCE(MAX(birth_date), MIN(birth_date)) AS birth_date,
        COALESCE(MAX(enrollment_date), MIN(enrollment_date)) AS enrollment_date,
        COALESCE(MAX(graduation_date), MIN(graduation_date)) AS graduation_date,
        COALESCE(MAX(gpa), MIN(gpa)) AS gpa
    FROM 
        (SELECT * FROM Student WHERE id = 2017
         UNION ALL
         SELECT * FROM Student WHERE id = 2008) s
)
SELECT * FROM MergedStudent;
+------+------------+-------------+------------+-----------------+-----------------+------+
| id   | name       | national_id | birth_date | enrollment_date | graduation_date | gpa  |
|------+------------+-------------+------------+-----------------+-----------------+------|
| 2017 | Potu Singh | 1312445677  | 2003-06-12 | 2022-01-15      | 2025-06-15      | 3.04 |
+------+------------+-------------+------------+-----------------+-----------------+------+
SELECT 1
Time: 0.008s

The query shows the merged data for Potu Singh and Julia Roberts.

First, the query uses a CTE named MergedStudent to select and merge the rows. Within the CTE, it performs a UNION ALL on the two rows identified by their IDs. Then, it applies the COALESCE function on each column to ensure that the non-null values are retained. By using aggregate functions like MAX and MIN, the query effectively consolidates the data, choosing the first non-null value for each field.

Finally, the outer SELECT statement retrieves the merged row from the CTE, showing the consolidated data for Potu Singh and Julia Roberts.

5. Conclusion

In this article, we’ve explored various methods to merge two rows in SQL. By using UPDATE with COALESCE, INSERT INTO SELECT, and Common Table Expressions (CTEs), we can effectively consolidate data and ensure data integrity.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.