Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

In SQL, there are scenarios where retaining only the top or bottom N records from a table is necessary. This task is common when working with large datasets and requires efficient ways to filter and delete records. However, various SQL databases offer different techniques to achieve this.

In this tutorial, we’ll explore how to delete all records except the top or bottom N. Furthermore, we’ll look at practical examples in PostgreSQL, MySQL, and SQL Server. We’ll use the simplified Baeldung University database for illustration.

2. Implementation in PostgreSQL

PostgreSQL provides several methods to handle the deletion of all records except the top or bottom N.

2.1. Using a Common Table Expression

A Common Table Expression (CTE) is a temporary result set defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. In particular, it’s useful for organizing and simplifying complex queries. We can use a CTE to select the top or bottom N records and then delete the rest.

For example, let’s see how to delete all students except the top 5 based on their ID. We use a CTE to select the top 5 student IDs and then delete any student whose ID isn’t in this top 5 list:

WITH TopStudents AS (
    SELECT id
    FROM Student
    ORDER BY id
    LIMIT 5
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM TopStudents);
DELETE 20
Time: 0.003s

In this example, we define a CTE named TopStudents that selects the top 5 student IDs by ordering the students by their ID in ascending order. Then, we use the DELETE statement to remove student records whose IDs aren’t in the TopStudents result set.

Let’s check the new records of the Student table:

SELECT * FROM Student;
+------+-----------------+-------------+------------+-----------------+-----------------+-----+
| id   | name            | national_id | birth_date | enrollment_date | graduation_date | gpa |
|------+-----------------+-------------+------------+-----------------+-----------------+-----|
| 1001 | John Liu        | 123345566   | 2001-04-05 | 2020-01-15      | 2024-06-15      | 4.0 |
| 1003 | Rita Ora        | 132345166   | 2001-01-14 | 2020-01-15      | 2024-06-15      | 4.2 |
| 1007 | Philip Lose     | 321345566   | 2001-06-15 | 2020-01-15      | 2024-06-15      | 3.8 |
| 1010 | Samantha Prabhu | 3217165566  | 2001-03-21 | 2020-01-15      | 2024-06-15      | 4.9 |
| 1011 | Vikas Jain      | 321345662   | 2001-07-18 | 2020-01-15      | <null>          | 3.3 |
+------+-----------------+-------------+------------+-----------------+-----------------+-----+
SELECT 5
Time: 0.008s

The result shows the top 5 records left.

Similarly, to delete all students except the bottom 5 based on their ID, we reverse the order:

WITH BottomStudents AS (
    SELECT id
    FROM Student
    ORDER BY id DESC
    LIMIT 5
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM BottomStudents);
DELETE 20
Time: 0.016s

In this example, the CTE named BottomStudents selects the bottom 5 student IDs by ordering the students by their ID in descending order. Then, the DELETE statement removes student records whose IDs aren’t in the BottomStudents result set.

Again, let’s check the new records of the Student table:

SELECT * FROM Student;
+------+----------------+-------------+------------+-----------------+-----------------+--------+
| id   | name           | national_id | birth_date | enrollment_date | graduation_date | gpa    |
|------+----------------+-------------+------------+-----------------+-----------------+--------|
| 2001 | Reena Roy      | 9023455613  | 2003-01-13 | 2022-01-15      | 2025-06-15      | 4.15   |
| 2009 | Pollards Grey  | 6503453662  | 2003-04-19 | 2022-01-15      | 2025-06-15      | 4.65   |
| 2017 | Potu Singh     | 1312445677  | 2003-03-11 | 2022-01-15      | <null>          | <null> |
| 2008 | Julia Roberts  | 1212446677  | 2003-06-12 | 2022-01-15      | 2025-06-15      | 3.04   |
| 2006 | Agatha Christi | 1100245767  | 2003-05-19 | 2022-01-15      | <null>          | 4.59   |
+------+----------------+-------------+------------+-----------------+-----------------+--------+
SELECT 5
Time: 0.013s

The result shows the bottom 5 records left.

It’s important to note that we can also implement this method in MySQL.

2.2. Using Subqueries

We can also use subqueries to retain the top or bottom N records while deleting the rest. In particular, this method involves selecting the IDs of the top or bottom N records in a subquery and using these IDs in the DELETE statement.

For example, let’s delete all courses except the top 10 based on their ID in the Course table:

DELETE FROM Course
WHERE id NOT IN (
    SELECT id
    FROM Course
    ORDER BY id
    LIMIT 10
);
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
DELETE 71
Time: 0.002s

In this case, the subquery selects the IDs of the top 10 courses ordered by their ID in ascending order. Then, the DELETE statement removes any course records whose IDs aren’t in the result set of the subquery.

Similarly, we can delete all courses except the bottom 5 based on their ID:

DELETE FROM Course
WHERE id NOT IN (
    SELECT id
    FROM Course
    ORDER BY id DESC
    LIMIT 5
);
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
DELETE 5
Time: 0.003s

Again, the query removes any course records whose IDs aren’t in the result set of the subquery.

It’s important to note that we can also implement this method in MySQL.

2.3. Using a Window Function

Window functions provide a powerful way to perform operations across a set of table rows related to the current row. We can use the ROW_NUMBER() window function to assign a unique number to each row. Then, we delete rows based on this numbering.

For example, let’s delete all student records except the top 10 based on their ID:

WITH NumberedStudents AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS row_num
    FROM Student
)
DELETE FROM Student
WHERE id IN (
    SELECT id
    FROM NumberedStudents
    WHERE row_num > 10
);
DELETE 15
Time: 0.005s

In this example, the ROW_NUMBER() function assigns a unique number to each student based on their ID in ascending order. Additionally, the NumberedStudents CTE includes these row numbers.

Similarly, let’s delete students except the bottom 5 based on their ID:

WITH NumberedStudents AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY id DESC) AS row_num
    FROM Student
)
DELETE FROM Student
WHERE id IN (
    SELECT id
    FROM NumberedStudents
    WHERE row_num > 5
);
DELETE 5
Time: 0.004s

Here, the ROW_NUMBER() function assigns a unique number to each student based on their ID in descending order.

Importantly, we can also implement this method in SQL Server.

3. Implementation in MySQL

MySQL also provides methods to handle the deletion of all records except the top or bottom N. We can use user-defined variables to assign a sequential integer to each row. In particular, this helps in selecting the top or bottom N records. The method involves creating a variable that increments with each row and using it to filter the rows.

For example, let’s delete all student records except the top 15 based on their ID:

SET @row_number = 0;

DELETE FROM Student
WHERE id NOT IN (
    SELECT id FROM (
        SELECT id, (@row_number := @row_number + 1) AS row_num
        FROM Student
        ORDER BY id
    ) AS numbered
    WHERE row_num <= 15
);
Query OK, 10 rows affected, 1 warning (0.006 sec)

In this query, we initialize a user-defined variable @row_number to 0. The subquery selects the IDs of the students along with a row number generated by incrementing @row_number for each row. Then, the main DELETE statement removes any student records whose row number is greater than 15.

Similarly, let’s delete all students except the bottom 5 based on their ID:

SET @row_number = 0;

DELETE FROM Student
WHERE id NOT IN (
    SELECT id FROM (
        SELECT id, (@row_number := @row_number + 1) AS row_num
        FROM Student
        ORDER BY id DESC
    ) AS numbered
    WHERE row_num <= 5
);
Query OK, 10 rows affected, 1 warning (0.006 sec)

In this case, we order the students by their ID in descending order and follow the same process to generate row numbers and delete the records whose row number is greater than 5.

4. Implementation in SQL Server

SQL Server provides multiple methods to delete all records except the top or bottom N. In particular, we can use the ROW_NUMBER() window function in a way that’s identical to the PostgreSQL implementation we’ve seen earlier. Another approach is to use a CTE. For example, let’s delete all records except for the top 10 students based on their ID:

WITH TopStudents AS (
    SELECT id
    FROM Student
    ORDER BY id
    OFFSET 0 ROWS
    FETCH NEXT 10 ROWS ONLY
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM TopStudents);
Time: 0.002s

This query defines a CTE named TopStudents that selects the top 10 student IDs ordered by their ID. Then, use the DELETE statement to remove student records whose IDs aren’t in the TopStudents result set.

Additionally, we can also delete all records except the bottom 5 students based on their ID:

WITH BottomStudents AS (
    SELECT id
    FROM Student
    ORDER BY id DESC
    OFFSET 0 ROWS
    FETCH NEXT 5 ROWS ONLY
)
DELETE FROM Student
WHERE id NOT IN (SELECT id FROM BottomStudents);
Time: 0.002s

Here, the CTE named BottomStudents selects the bottom 5 student IDs by ordering the students by their ID in descending order. Similarly, it uses the DELETE statement to remove student records whose IDs aren’t in the BottomStudents result set.

5. Conclusion

In this article, we explored various methods to delete all records except the top or bottom N in PostgreSQL, MySQL, and SQL Server. We utilized techniques such as Common Table Expressions (CTEs), subqueries, and window functions to achieve this task efficiently. Furthermore, by understanding these methods, we can manage large datasets more effectively and maintain optimal database performance.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.