Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

As a Data Analyst, comparing two tables to find differences is a common task, especially in processes like data migration, synchronization, and data quality assurance. SQL offers several efficient methods to accomplish this task.

In this tutorial, we’ll explore how to identify differences between tables using EXCEPT and JOIN queries.

Also, we’ll explore practical example use cases for this scenario using the Baeldung University database schema and sample data.

2. Using EXCEPT to Identify Table Differences

Utilizing the EXCEPT operator in SQL provides a straightforward method to pinpoint discrepancies between tables. However, for a head-to-head comparison, it is essential to ensure that both tables have the same schema with all columns.

2.1. Illustrative Sample Data

For illustrative purposes, let’s consider the Department table, which includes three columns – id, name, and code. The output displays a list of departments alongside their respective IDs, names (e.g., Computer Science, Electronics, and Communications), and department codes (CS, EC, ME, CE, MA):

> SELECT * FROM Department;
 id |              name              | code
----+--------------------------------+------
  1 | Computer Science               | CS
  2 | Electronics and Communications | EC
  3 | Mechanical Engineering         | ME
  4 | Civil Engineering              | CE
  5 | Mathematics                    | MA
(5 rows)

Similarly, we’ll have another table called Updated_department with the same schema as the Department table. However, it includes additional data, including some NULL values, for better comprehension:

> SELECT * FROM Updated_department;
 id |              name              | code
----+--------------------------------+------
  2 | Electronics and Communications | EC
  3 | Mechanical Engineering         | ME
  4 | Civil Engineering              | CE
  5 | Mathematics                    | MA
  6 | Biology                        | BO
  6 |                                |
  1 | Computer                       | CS
  5 | Mathematics                    | MT
(8 rows)

Using the Department and Updated_department tables, let’s explore how the EXCEPT operator in SQL can be used to identify differences between them.

2.2. Using EXCEPT Operator

First, the query retrieves all rows from the Updated_department table. Then, it compares each entire row (all columns) between the two tables and identifies rows from Updated_department that do not have corresponding rows in Department.

This operation effectively detects records that have been added or modified in Updated_department compared to Department, provided both tables share the same schema:

> SELECT * FROM Updated_department EXCEPT SELECT * FROM Department;
 id |    name     | code
----+-------------+------
  6 |             |
  5 | Mathematics | MT
  1 | Computer    | CS
  6 | Biology     | BO
(4 rows)

On the other hand, the query below selects all rows from the Department table that do not have matching rows in the Updated_department table:

> SELECT * FROM Department EXCEPT SELECT * FROM Updated_department;
 id |       name       | code
----+------------------+------
  1 | Computer Science | CS
(1 row)

At times, we might encounter the error “ERROR: each EXCEPT query must have the same number of columns“, indicating that the queries do not have identical column counts or schemas.

3. Using JOIN to Identify Table Differences

Now, let’s utilize the JOIN method to find and compare information from two tables. This method can handle tables with different schemas and columns, requiring at least one common column for matching rows.

3.1. Sample Data

In addition to the Department table, we’ll also use the faculty table with a different schema for better illustration:

> SELECT * FROM faculty;
 id  |      name      | national_id |      position       | start_date |  end_date  | department_id | active
-----+----------------+-------------+---------------------+------------+------------+---------------+--------
   1 | Anubha Gupta   |  1018901231 | Professor           | 2010-01-11 | 2027-03-11 |             2 | t
 111 | AV Subramanium |  1340902317 | Assistant Professor | 2011-05-11 |            |             1 | t
 121 | Risa Sodi      |  1409239017 | Associate Professor | 2010-01-11 |            |             1 | t
...
... output truncated ...
...
(7 rows)

3.2. Using JOIN Operator

The SELECT query retrieves all columns from both the Faculty and Department tables, joining them based on the id column in Department and the department_id column in Faculty table. It combines data from both tables, showing matched records and NULL values where there are no matches in Department table:

> SELECT fac.id as faculty_id, fac.name as faculty_name, fac.position, dep.name as department, dep.code as code, dep.id as dept_id, fac.start_date, fac.end_date 
    FROM Faculty fac 
    LEFT JOIN Department dep 
    ON dep.id = fac.department_id;
 faculty_id |  faculty_name  |      position       |           department           | code | dept_id | start_date |  end_date
------------+----------------+---------------------+--------------------------------+------+---------+------------+------------
          1 | Anubha Gupta   | Professor           | Electronics and Communications | EC   |       2 | 2010-01-11 | 2027-03-11
        111 | AV Subramanium | Assistant Professor | Computer Science               | CS   |       1 | 2011-05-11 |
...
... output truncated ...
...
        741 | Sophia Ker     | Teaching Assistant  | Civil Engineering              | CE   |       4 | 2022-08-11 |
(7 rows)

Now, let’s include a WHERE condition to filter the rows where the end_date column is NULL, effectively showing only active faculty members along with their corresponding department information from the department:

> SELECT fac.id as faculty_id, fac.name as faculty_name, fac.position, dep.name as department, dep.code as code, dep.id as dept_id, fac.start_date, fac.end_date 
    FROM Faculty fac 
    LEFT JOIN Department dep 
    ON dep.id = fac.department_id 
        WHERE fac.end_date IS NULL;
 faculty_id |  faculty_name  |      position       |       department       | code | dept_id | start_date | end_date
------------+----------------+---------------------+------------------------+------+---------+------------+----------
        111 | AV Subramanium | Assistant Professor | Computer Science       | CS   |       1 | 2011-05-11 |
        121 | Risa Sodi      | Associate Professor | Computer Science       | CS   |       1 | 2010-01-11 |
        512 | Casper Jones   | Teaching Assistant  | Mechanical Engineering | ME   |       3 | 2021-04-11 |
        740 | Kira Wass      | Teaching Assistant  | Civil Engineering      | CE   |       4 | 2021-09-11 |
        741 | Sophia Ker     | Teaching Assistant  | Civil Engineering      | CE   |       4 | 2022-08-11 |
(5 rows)

On the other hand, we can also include the WHERE condition to filter the rows where the end_date column is not NULL by just adding NOT clause before the NULL keyword. It effectively shows only inactive or former faculty members, along with their corresponding department information from department.

4. Conclusion

In summary, SQL’s JOIN and EXCEPT operators help find and manage differences between tables. The JOIN operation allows for combining related rows from multiple tables based on a related column. On the other hand, the EXCEPT operator identifies rows that exist in one table but not in the other.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.