Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

Most times, in SQL, values in one table often have matching or related records in another table. However, in some cases, they do not, leading to the existence of orphaned records.

In this article, we’ll learn how to delete rows in SQL where the IDs do not have a match in another table. Also, we’ll use the Students and Registration table in Baeldung University schema and data to show examples.

2. Understanding Orphan Records

Orphaned records are data that reference non-existent data on a SQL server. Basically, they’re values with IDs that do not have a match from another table.

These types of data can cause inconsistencies in a relational database; hence, we need to clean them up to maintain data integrity. Nevertheless, before we do this, we need to identify the orphan records in the tables.

For example, let’s imagine we’ve got two tables in a database. The first is Students:

SELECT* FROM Students;
+------------+------------+-------------+------------+-----------------+-----------------+------+
| student_id | name       | national_id | birth_date | enrollment_date | graduation_date | gpa  |
+------------+------------+-------------+------------+-----------------+-----------------+------+
|          3 | Tobechukwu |  5230528893 | 1997-07-21 | 2014-10-01      | 2018-07-13      |  3.5 |
|          7 | Faith      |  2345678901 | 1981-12-02 | 2019-06-15      | 2019-06-09      |    2 |
|         11 | Tomiwa     |  3456289812 | 2002-07-03 | 2020-11-11      | 2024-10-01      |  3.9 |
|         19 | Chirstiana |  4567829123 | 1954-04-04 | 2011-09-01      | 2015-10-26      |  1.5 |
|         33 | Ola        |  5178907424 | 1976-05-20 | 2022-08-17      | 2026-06-05      |  4.5 |
|         60 | Abioye     |  6719912835 | 2005-11-06 | 2010-12-13      | 2016-12-01      |    3 |
+------------+------------+-------------+------------+-----------------+-----------------+------+
6 rows in set (0.003 sec)

Meanwhile, the other is Registration:

SELECT* FROM Registration;
+-----------------+----------+------+---------------------+-----------+------------+
| registration_id | semester | year | reg_datetime        | course_id | student_id |
+-----------------+----------+------+---------------------+-----------+------------+
|               1 | Fall     | 2019 | 2019-08-20 11:00:00 | C101      |          3 |
|               2 | Spring   | 2020 | 2020-01-15 19:00:00 | C102      |          7 |
|               3 | Fall     | 2020 | 2020-08-20 10:00:00 | C103      |         11 |
|               4 | Spring   | 2021 | 2021-01-20 13:00:00 | C104      |         30 |
|               5 | Fall     | 2021 | 2021-08-20 14:00:00 | C105      |         90 |
|               6 | Fall     | 2021 | 2021-05-25 12:00:00 | C103      |          2 |
+-----------------+----------+------+---------------------+-----------+------------+
6 rows in set (0.004 sec)

Let’s now show how to identify rows where some IDs don’t have a corresponding value in the other table using the NOT IN clause:

SELECT *
FROM Registration
WHERE student_id NOT IN (
    SELECT student_id
    FROM Students
);
+-----------------+----------+------+---------------------+-----------+------------+
| registration_id | semester | year | reg_datetime        | course_id | student_id |
+-----------------+----------+------+---------------------+-----------+------------+
|               4 | Spring   | 2021 | 2021-01-20 13:00:00 | C104      |         30 |
|               5 | Fall     | 2021 | 2021-08-20 14:00:00 | C105      |         90 |
|               6 | Fall     | 2021 | 2021-05-25 12:00:00 | C103      |          2 |
+-----------------+----------+------+---------------------+-----------+------------+
3 rows in set (0.053 sec)

From the example, we can see that the results shown in the rows from the Registration table are rows with student_id , which don’t have a reference in the Students table.

3. Using LEFT JOIN to Delete Rows Where ID Do Not Match

In SQL, we use LEFT JOIN, which is also known as a left outer join, to retrieve related records from both the left and right tables. In this case, the left table contains all the records being returned, including values without a match. Meanwhile, the right table has only matched records, which are returned.

However, aside from this general function, we can use a LEFT JOIN to delete orphan records.

For example, let’s show how to delete orphaned records from the Registration table through the LEFT JOIN method:

DELETE FROM Registration
WHERE student_id IN (
    SELECT Registration.student_id
    FROM Registration
    LEFT JOIN Students ON Registration.student_id = Students.student_id
    WHERE Students.student_id IS NULL
);
Query OK, 3 rows affected (0.013 sec)

Now, let’s verify that the orphaned records have been deleted:

SELECT* FROM Registration;
+-----------------+----------+------+---------------------+-----------+------------+
| registration_id | semester | year | reg_datetime        | course_id | student_id |
+-----------------+----------+------+---------------------+-----------+------------+
|               1 | Fall     | 2019 | 2019-08-20 11:00:00 | C101      |          3 |
|               2 | Spring   | 2020 | 2020-01-15 19:00:00 | C102      |          7 |
|               3 | Fall     | 2020 | 2020-08-20 10:00:00 | C103      |         11 |
+-----------------+----------+------+---------------------+-----------+------------+
3 rows in set (0.003 sec)

Here, we can see that this operation deletes all the IDs without a match on the Students table.

4. Using NOT IN Clause

Furthermore, in cases where we want a simpler way, we can use the NOT IN clause to delete IDs that do not have a match.

Unlike a LEFT JOIN, which retrieves values with matching records from another table, we can use this to directly identify and delete records with no corresponding value.

For example, let’s use the NOT IN clause to delete noncorresponding IDs:

DELETE FROM Registration
WHERE student_id NOT IN (
    SELECT student_id
    FROM Students
);
Query OK, 3 rows affected (0.011 sec)

Basically, in this example, the subquery selects the student_id values from the Students table, while the main query deletes all records from the Registration table without a corresponding student_id in the Students table.

5. Using NOT EXISTS Clause

Just as it sounds, the NOT EXISTS clause in SQL enables us to confirm the absence of related rows in another table before executing actions like deleting.

Moreover, another standout benefit of using the NOT EXISTS clause is the scalability it offers. This comes into play when handling large datasets or complex queries.

For example, let’s explore how to use the NOT EXISTS clause in SQL to delete IDs without corresponding values:

DELETE FROM Registration
WHERE NOT EXISTS (
    SELECT 1
    FROM Students
    WHERE Students.student_id = Registration.student_id
);
Query OK, 3 rows affected (0.006 sec)

Definitely, the query removes all records in the Registration table with no related entries in the Students table. To confirm that our action was successful, we can use the SELECT statement to check the table.

6. Conclusion

In this article, we’ve explored various ways to delete IDs without corresponding values in another table. However, each of these methods is unique and has different advantages.

To begin, LEFT JOIN and NOT IN are generally preferable when handling smaller datasets. However, in cases where we need efficacy while dealing with larger data, the NOT EXISTS clause is ideal.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.