Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: July 20, 2024
Most times, in SQL, values in one table often have matching or related records in another table. However, in some cases, they do not, leading to the existence of orphaned records.
In this article, we’ll learn how to delete rows in SQL where the IDs do not have a match in another table. Also, we’ll use the Students and Registration table in Baeldung University schema and data to show examples.
Orphaned records are data that reference non-existent data on a SQL server. Basically, they’re values with IDs that do not have a match from another table.
These types of data can cause inconsistencies in a relational database; hence, we need to clean them up to maintain data integrity. Nevertheless, before we do this, we need to identify the orphan records in the tables.
For example, let’s imagine we’ve got two tables in a database. The first is Students:
SELECT* FROM Students;
+------------+------------+-------------+------------+-----------------+-----------------+------+
| student_id | name | national_id | birth_date | enrollment_date | graduation_date | gpa |
+------------+------------+-------------+------------+-----------------+-----------------+------+
| 3 | Tobechukwu | 5230528893 | 1997-07-21 | 2014-10-01 | 2018-07-13 | 3.5 |
| 7 | Faith | 2345678901 | 1981-12-02 | 2019-06-15 | 2019-06-09 | 2 |
| 11 | Tomiwa | 3456289812 | 2002-07-03 | 2020-11-11 | 2024-10-01 | 3.9 |
| 19 | Chirstiana | 4567829123 | 1954-04-04 | 2011-09-01 | 2015-10-26 | 1.5 |
| 33 | Ola | 5178907424 | 1976-05-20 | 2022-08-17 | 2026-06-05 | 4.5 |
| 60 | Abioye | 6719912835 | 2005-11-06 | 2010-12-13 | 2016-12-01 | 3 |
+------------+------------+-------------+------------+-----------------+-----------------+------+
6 rows in set (0.003 sec)
Meanwhile, the other is Registration:
SELECT* FROM Registration;
+-----------------+----------+------+---------------------+-----------+------------+
| registration_id | semester | year | reg_datetime | course_id | student_id |
+-----------------+----------+------+---------------------+-----------+------------+
| 1 | Fall | 2019 | 2019-08-20 11:00:00 | C101 | 3 |
| 2 | Spring | 2020 | 2020-01-15 19:00:00 | C102 | 7 |
| 3 | Fall | 2020 | 2020-08-20 10:00:00 | C103 | 11 |
| 4 | Spring | 2021 | 2021-01-20 13:00:00 | C104 | 30 |
| 5 | Fall | 2021 | 2021-08-20 14:00:00 | C105 | 90 |
| 6 | Fall | 2021 | 2021-05-25 12:00:00 | C103 | 2 |
+-----------------+----------+------+---------------------+-----------+------------+
6 rows in set (0.004 sec)
Let’s now show how to identify rows where some IDs don’t have a corresponding value in the other table using the NOT IN clause:
SELECT *
FROM Registration
WHERE student_id NOT IN (
SELECT student_id
FROM Students
);
+-----------------+----------+------+---------------------+-----------+------------+
| registration_id | semester | year | reg_datetime | course_id | student_id |
+-----------------+----------+------+---------------------+-----------+------------+
| 4 | Spring | 2021 | 2021-01-20 13:00:00 | C104 | 30 |
| 5 | Fall | 2021 | 2021-08-20 14:00:00 | C105 | 90 |
| 6 | Fall | 2021 | 2021-05-25 12:00:00 | C103 | 2 |
+-----------------+----------+------+---------------------+-----------+------------+
3 rows in set (0.053 sec)
From the example, we can see that the results shown in the rows from the Registration table are rows with student_id , which don’t have a reference in the Students table.
In SQL, we use LEFT JOIN, which is also known as a left outer join, to retrieve related records from both the left and right tables. In this case, the left table contains all the records being returned, including values without a match. Meanwhile, the right table has only matched records, which are returned.
However, aside from this general function, we can use a LEFT JOIN to delete orphan records.
For example, let’s show how to delete orphaned records from the Registration table through the LEFT JOIN method:
DELETE FROM Registration
WHERE student_id IN (
SELECT Registration.student_id
FROM Registration
LEFT JOIN Students ON Registration.student_id = Students.student_id
WHERE Students.student_id IS NULL
);
Query OK, 3 rows affected (0.013 sec)
Now, let’s verify that the orphaned records have been deleted:
SELECT* FROM Registration;
+-----------------+----------+------+---------------------+-----------+------------+
| registration_id | semester | year | reg_datetime | course_id | student_id |
+-----------------+----------+------+---------------------+-----------+------------+
| 1 | Fall | 2019 | 2019-08-20 11:00:00 | C101 | 3 |
| 2 | Spring | 2020 | 2020-01-15 19:00:00 | C102 | 7 |
| 3 | Fall | 2020 | 2020-08-20 10:00:00 | C103 | 11 |
+-----------------+----------+------+---------------------+-----------+------------+
3 rows in set (0.003 sec)
Here, we can see that this operation deletes all the IDs without a match on the Students table.
Furthermore, in cases where we want a simpler way, we can use the NOT IN clause to delete IDs that do not have a match.
Unlike a LEFT JOIN, which retrieves values with matching records from another table, we can use this to directly identify and delete records with no corresponding value.
For example, let’s use the NOT IN clause to delete noncorresponding IDs:
DELETE FROM Registration
WHERE student_id NOT IN (
SELECT student_id
FROM Students
);
Query OK, 3 rows affected (0.011 sec)
Basically, in this example, the subquery selects the student_id values from the Students table, while the main query deletes all records from the Registration table without a corresponding student_id in the Students table.
Just as it sounds, the NOT EXISTS clause in SQL enables us to confirm the absence of related rows in another table before executing actions like deleting.
Moreover, another standout benefit of using the NOT EXISTS clause is the scalability it offers. This comes into play when handling large datasets or complex queries.
For example, let’s explore how to use the NOT EXISTS clause in SQL to delete IDs without corresponding values:
DELETE FROM Registration
WHERE NOT EXISTS (
SELECT 1
FROM Students
WHERE Students.student_id = Registration.student_id
);
Query OK, 3 rows affected (0.006 sec)
Definitely, the query removes all records in the Registration table with no related entries in the Students table. To confirm that our action was successful, we can use the SELECT statement to check the table.
In this article, we’ve explored various ways to delete IDs without corresponding values in another table. However, each of these methods is unique and has different advantages.
To begin, LEFT JOIN and NOT IN are generally preferable when handling smaller datasets. However, in cases where we need efficacy while dealing with larger data, the NOT EXISTS clause is ideal.