Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: August 13, 2024
In SQL, there are scenarios where retaining only the top or bottom N records from a table is necessary. This task is common when working with large datasets and requires efficient ways to filter and delete records. However, various SQL databases offer different techniques to achieve this.
In this tutorial, we’ll explore how to delete all records except the top or bottom N. Furthermore, we’ll look at practical examples in PostgreSQL, MySQL, and SQL Server. We’ll use the simplified Baeldung University database for illustration.
PostgreSQL provides several methods to handle the deletion of all records except the top or bottom N.
A Common Table Expression (CTE) is a temporary result set defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. In particular, it’s useful for organizing and simplifying complex queries. We can use a CTE to select the top or bottom N records and then delete the rest.
For example, let’s see how to delete all students except the top 5 based on their ID. We use a CTE to select the top 5 student IDs and then delete any student whose ID isn’t in this top 5 list:
WITH TopStudents AS (
SELECT id
FROM Student
ORDER BY id
LIMIT 5
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM TopStudents);
DELETE 20
Time: 0.003s
In this example, we define a CTE named TopStudents that selects the top 5 student IDs by ordering the students by their ID in ascending order. Then, we use the DELETE statement to remove student records whose IDs aren’t in the TopStudents result set.
Let’s check the new records of the Student table:
SELECT * FROM Student;
+------+-----------------+-------------+------------+-----------------+-----------------+-----+
| id | name | national_id | birth_date | enrollment_date | graduation_date | gpa |
|------+-----------------+-------------+------------+-----------------+-----------------+-----|
| 1001 | John Liu | 123345566 | 2001-04-05 | 2020-01-15 | 2024-06-15 | 4.0 |
| 1003 | Rita Ora | 132345166 | 2001-01-14 | 2020-01-15 | 2024-06-15 | 4.2 |
| 1007 | Philip Lose | 321345566 | 2001-06-15 | 2020-01-15 | 2024-06-15 | 3.8 |
| 1010 | Samantha Prabhu | 3217165566 | 2001-03-21 | 2020-01-15 | 2024-06-15 | 4.9 |
| 1011 | Vikas Jain | 321345662 | 2001-07-18 | 2020-01-15 | <null> | 3.3 |
+------+-----------------+-------------+------------+-----------------+-----------------+-----+
SELECT 5
Time: 0.008s
The result shows the top 5 records left.
Similarly, to delete all students except the bottom 5 based on their ID, we reverse the order:
WITH BottomStudents AS (
SELECT id
FROM Student
ORDER BY id DESC
LIMIT 5
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM BottomStudents);
DELETE 20
Time: 0.016s
In this example, the CTE named BottomStudents selects the bottom 5 student IDs by ordering the students by their ID in descending order. Then, the DELETE statement removes student records whose IDs aren’t in the BottomStudents result set.
Again, let’s check the new records of the Student table:
SELECT * FROM Student;
+------+----------------+-------------+------------+-----------------+-----------------+--------+
| id | name | national_id | birth_date | enrollment_date | graduation_date | gpa |
|------+----------------+-------------+------------+-----------------+-----------------+--------|
| 2001 | Reena Roy | 9023455613 | 2003-01-13 | 2022-01-15 | 2025-06-15 | 4.15 |
| 2009 | Pollards Grey | 6503453662 | 2003-04-19 | 2022-01-15 | 2025-06-15 | 4.65 |
| 2017 | Potu Singh | 1312445677 | 2003-03-11 | 2022-01-15 | <null> | <null> |
| 2008 | Julia Roberts | 1212446677 | 2003-06-12 | 2022-01-15 | 2025-06-15 | 3.04 |
| 2006 | Agatha Christi | 1100245767 | 2003-05-19 | 2022-01-15 | <null> | 4.59 |
+------+----------------+-------------+------------+-----------------+-----------------+--------+
SELECT 5
Time: 0.013s
The result shows the bottom 5 records left.
It’s important to note that we can also implement this method in MySQL.
We can also use subqueries to retain the top or bottom N records while deleting the rest. In particular, this method involves selecting the IDs of the top or bottom N records in a subquery and using these IDs in the DELETE statement.
For example, let’s delete all courses except the top 10 based on their ID in the Course table:
DELETE FROM Course
WHERE id NOT IN (
SELECT id
FROM Course
ORDER BY id
LIMIT 10
);
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
DELETE 71
Time: 0.002s
In this case, the subquery selects the IDs of the top 10 courses ordered by their ID in ascending order. Then, the DELETE statement removes any course records whose IDs aren’t in the result set of the subquery.
Similarly, we can delete all courses except the bottom 5 based on their ID:
DELETE FROM Course
WHERE id NOT IN (
SELECT id
FROM Course
ORDER BY id DESC
LIMIT 5
);
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
DELETE 5
Time: 0.003s
Again, the query removes any course records whose IDs aren’t in the result set of the subquery.
It’s important to note that we can also implement this method in MySQL.
Window functions provide a powerful way to perform operations across a set of table rows related to the current row. We can use the ROW_NUMBER() window function to assign a unique number to each row. Then, we delete rows based on this numbering.
For example, let’s delete all student records except the top 10 based on their ID:
WITH NumberedStudents AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS row_num
FROM Student
)
DELETE FROM Student
WHERE id IN (
SELECT id
FROM NumberedStudents
WHERE row_num > 10
);
DELETE 15
Time: 0.005s
In this example, the ROW_NUMBER() function assigns a unique number to each student based on their ID in ascending order. Additionally, the NumberedStudents CTE includes these row numbers.
Similarly, let’s delete students except the bottom 5 based on their ID:
WITH NumberedStudents AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY id DESC) AS row_num
FROM Student
)
DELETE FROM Student
WHERE id IN (
SELECT id
FROM NumberedStudents
WHERE row_num > 5
);
DELETE 5
Time: 0.004s
Here, the ROW_NUMBER() function assigns a unique number to each student based on their ID in descending order.
Importantly, we can also implement this method in SQL Server.
MySQL also provides methods to handle the deletion of all records except the top or bottom N. We can use user-defined variables to assign a sequential integer to each row. In particular, this helps in selecting the top or bottom N records. The method involves creating a variable that increments with each row and using it to filter the rows.
For example, let’s delete all student records except the top 15 based on their ID:
SET @row_number = 0;
DELETE FROM Student
WHERE id NOT IN (
SELECT id FROM (
SELECT id, (@row_number := @row_number + 1) AS row_num
FROM Student
ORDER BY id
) AS numbered
WHERE row_num <= 15
);
Query OK, 10 rows affected, 1 warning (0.006 sec)
In this query, we initialize a user-defined variable @row_number to 0. The subquery selects the IDs of the students along with a row number generated by incrementing @row_number for each row. Then, the main DELETE statement removes any student records whose row number is greater than 15.
Similarly, let’s delete all students except the bottom 5 based on their ID:
SET @row_number = 0;
DELETE FROM Student
WHERE id NOT IN (
SELECT id FROM (
SELECT id, (@row_number := @row_number + 1) AS row_num
FROM Student
ORDER BY id DESC
) AS numbered
WHERE row_num <= 5
);
Query OK, 10 rows affected, 1 warning (0.006 sec)
In this case, we order the students by their ID in descending order and follow the same process to generate row numbers and delete the records whose row number is greater than 5.
SQL Server provides multiple methods to delete all records except the top or bottom N. In particular, we can use the ROW_NUMBER() window function in a way that’s identical to the PostgreSQL implementation we’ve seen earlier. Another approach is to use a CTE. For example, let’s delete all records except for the top 10 students based on their ID:
WITH TopStudents AS (
SELECT id
FROM Student
ORDER BY id
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM TopStudents);
Time: 0.002s
This query defines a CTE named TopStudents that selects the top 10 student IDs ordered by their ID. Then, use the DELETE statement to remove student records whose IDs aren’t in the TopStudents result set.
Additionally, we can also delete all records except the bottom 5 students based on their ID:
WITH BottomStudents AS (
SELECT id
FROM Student
ORDER BY id DESC
OFFSET 0 ROWS
FETCH NEXT 5 ROWS ONLY
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM BottomStudents);
Time: 0.002s
Here, the CTE named BottomStudents selects the bottom 5 student IDs by ordering the students by their ID in descending order. Similarly, it uses the DELETE statement to remove student records whose IDs aren’t in the BottomStudents result set.
In this article, we explored various methods to delete all records except the top or bottom N in PostgreSQL, MySQL, and SQL Server. We utilized techniques such as Common Table Expressions (CTEs), subqueries, and window functions to achieve this task efficiently. Furthermore, by understanding these methods, we can manage large datasets more effectively and maintain optimal database performance.