Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

When working with databases, inserting new records or updating existing ones if they already exist are both common operations. Because of this, an atomic method to perform either exists in database management systems. The general operation is known as upsert (derived from update and insert) and ensures that records remain consistent without duplication. In MySQL, this functionality is available through INSERT … ON DUPLICATE KEY UPDATE while the SQL standard defines it as MERGE.

Before PostgreSQL 9.5, upserting usually required complex workarounds since there was no built-in support. However, this changed with the introduction of the ON CONFLICT clause, and more recently, the MERGE statement in PostgreSQL 15.

In this tutorial, we’ll explore various ways to upsert in PostgreSQL using the Baeldung University Schema.

2. Example Subset and Operation

To illustrate different operations, we use the Student table:

> SELECT * FROM Student;
  id  |      name       | national_id | birth_date | enrollment_date | graduation_date | gpa  
------+-----------------+-------------+------------+-----------------+-----------------+------
 ...
 1707 | Piu Liu         |  2101368101 | 2002-03-14 | 2021-01-15      |                 | 2.99
 1717 | Param Mohan     |  1023456545 | 2002-05-15 | 2021-01-15      | 2025-06-15      | 2.75
 1719 | Siren Lobo      |   189091342 | 2002-06-17 | 2021-01-15      |                 |     
 1721 | Vini Puh        |  1312091343 | 2002-05-13 | 2021-01-15      | 2025-06-15      | 3.64
(25 rows)

In this table, let’s upsert (update or insert) two student records:

(1717, 'Param Mohan', 1023456545, '2002-05-15', '2021-01-15', '2025-06-15', 3.50)
(1722, 'Alice Parker', 1098765432, '2003-02-20', '2022-01-15', NULL, NULL)

After the upsert, the student with an id of 1717 needs to get a new gpa of 3.50, and a new student with an id of 1722 needs to appear in the student records along with their given details.

3. Using INSERT … ON CONFLICT

The INSERT … ON CONFLICT statement offers an efficient way to perform an upsert in PostgreSQL. This approach updates the gpa if a conflict arises on the specified constraint id or inserts a new student otherwise:

> INSERT INTO Student (id, name, national_id, birth_date, enrollment_date, graduation_date, gpa)
VALUES
    (1717, 'Param Mohan', 1023456545, '2002-05-15', '2021-01-15', '2025-06-15', 3.50),
    (1722, 'Alice Parker', 1098765432, '2003-02-20', '2022-01-15', NULL, NULL)
ON CONFLICT (id)
DO UPDATE SET
    name = EXCLUDED.name,
    national_id = EXCLUDED.national_id,
    birth_date = EXCLUDED.birth_date,
    enrollment_date = EXCLUDED.enrollment_date,
    graduation_date = EXCLUDED.graduation_date,
    gpa = EXCLUDED.gpa;

Let’s analyze the query:

  • INSERT INTO Student (…) VALUES (…) inserts new records into the Student table
  • ON CONFLICT (id) instructs that if a conflict occurs on the id column (the primary key) then an update needs to occur on the existing row instead of throwing an error
  • DO UPDATE SET defines the new values to update the existing row in case of a conflict
  • EXCLUDED.column_name enables us to reference the values from the row that was to be inserted (but instead caused a conflict) so that we can use these values to update the existing row

Now, let’s display the updated student records:

> SELECT * FROM Student;
  id  |      name       | national_id | birth_date | enrollment_date | graduation_date | gpa  
------+-----------------+-------------+------------+-----------------+-----------------+------
 ...
 1707 | Piu Liu         |  2101368101 | 2002-03-14 | 2021-01-15      |                 | 2.99
 1719 | Siren Lobo      |   189091342 | 2002-06-17 | 2021-01-15      |                 |     
 1721 | Vini Puh        |  1312091343 | 2002-05-13 | 2021-01-15      | 2025-06-15      | 3.64
 1717 | Param Mohan     |  1023456545 | 2002-05-15 | 2021-01-15      | 2025-06-15      |  3.5
 1722 | Alice Parker    |  1098765432 | 2003-02-20 | 2022-01-15      |                 |     
(26 rows)

So, the query inserted the student with id 1722 as a new record since they didn’t exist and updated the gpa for the student with id 1717 to 3.50 since they already existed.

4. Using MERGE

PostgreSQL 15 introduced the MERGE statement, simplifying inserting new records or updating existing ones in a structured manner. Instead of handling conflicts manually, MERGE enables us to specify conditions for updating or inserting records in a single query.

So, we can see how the MERGE statement works with the query:

> MERGE INTO Student AS s
USING (VALUES
    (1717, 'Param Mohan', 1023456545, CAST('2002-05-15' AS DATE), CAST('2021-01-15' AS DATE), CAST('2025-06-15' AS DATE), 3.50),
    (1722, 'Alice Parker', 1098765432, CAST('2003-02-20' AS DATE), CAST('2022-01-15' AS DATE), NULL, NULL)
) AS v(id, name, national_id, birth_date, enrollment_date, graduation_date, gpa)
ON s.id = v.id
WHEN MATCHED THEN 
    UPDATE SET 
        name = v.name,
        national_id = v.national_id,
        birth_date = v.birth_date,
        enrollment_date = v.enrollment_date,
        graduation_date = v.graduation_date,
        gpa = v.gpa
WHEN NOT MATCHED THEN 
    INSERT (id, name, national_id, birth_date, enrollment_date, graduation_date, gpa)
    VALUES (v.id, v.name, v.national_id, v.birth_date, v.enrollment_date, v.graduation_date, v.gpa);

Now, let’s understand the breakdown of the query:

  • MERGE INTO Student AS s specifies that we’re merging records into the Student table and gives it the alias s
  • USING (VALUES (…)) AS v(…) defines a temporary table v with the new student records whereas the explicit CAST statements ensure the correct DATE data type, avoiding errors
  • ON s.id = v.id utilizes the id column to match existing records in the Student table records to the new records to insert
  • WHEN MATCHED THEN UPDATE SET updates the existing row with new values if a match is found (id already exists in the Student table)
  • WHEN NOT MATCHED THEN INSERT inserts the new record instead if no match is found (i.e., id doesn’t exist in the Student table)

Without explicit casting, PostgreSQL assumes birth_date, enrollment_date, and graduation_date are of the TEXT type instead of DATE. To clarify, CAST(‘YYYY-MM-DD’ AS DATE) ensures proper type conversion, preventing errors:

> SELECT * FROM Student;
  id  |      name       | national_id | birth_date | enrollment_date | graduation_date | gpa  
------+-----------------+-------------+------------+-----------------+-----------------+------
 ...
 1707 | Piu Liu         |  2101368101 | 2002-03-14 | 2021-01-15      |                 | 2.99
 1719 | Siren Lobo      |   189091342 | 2002-06-17 | 2021-01-15      |                 |     
 1721 | Vini Puh        |  1312091343 | 2002-05-13 | 2021-01-15      | 2025-06-15      | 3.64
 1717 | Param Mohan     |  1023456545 | 2002-05-15 | 2021-01-15      | 2025-06-15      |  3.5
 1722 | Alice Parker    |  1098765432 | 2003-02-20 | 2022-01-15      |                 |     
(26 rows)

Above, we display the Student table records showing that the query works as intended.

While ON CONFLICT is straightforward when handling simple upserts, MERGE is better suited for more complex scenarios, for instance, needing multiple conditions before updating or inserting records.

5. Performance Considerations

When deciding between ON CONFLICT and MERGE, performance is an important factor.

ON CONFLICT works well when handling single-row upserts. It’s convenient when updating or inserting one record at a time, such as modifying a user profile (e.g., updating a user email). Meanwhile, MERGE is better for handling bulk updates and inserts. Thus, it’s usually more applicable for processing large datasets. However, additional checks and operations may give rise to more overhead.

Therefore, choosing the right approach depends on whether we need fast individual updates (ON CONFLICT) or batch updates (MERGE).

To improve the upsert speed for both methods, we can ensure proper indexing on primary keys and unique constraints.

6. Conclusion

In this article, we discussed how to upsert (update or insert) in PostgreSQL.

The INSERT … ON CONFLICT statement works well for simple upserts whereas MERGE is more powerful and flexible when handling updates and inserts in complex scenarios. ON CONFLICT and MERGE make updating or inserting in PostgreSQL significantly easier. Additionally, the number of rows to be updated influences how we choose the right approach.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.