How to Restrict Results to Top N Rows per Group in SQL Baeldung on SQL

1. Introduction

When working with SQL databases, we may often encounter scenarios where we need to retrieve the top n rows for each group within the dataset. For instance, we might want to find the top highest-selling products in each category, and the top 5 recent transactions for each customer.

In this tutorial, we’ll discuss and implement the logic to retrieve the top n rows per group.

For demo purposes, we will be using the Baeldung University schema.

2. Ranking Rows per Group

To return the top n rows per group, we need a mechanism to rank the records in each group. Using window functions like ROW_NUMBER(), RANK(), or DENSE_RANK() is a common approach to achieve this. All three major databases – MySQL, PostgreSQL, and SQL Server – support these window functions.

2.1. ROW_NUMBER()

ROW_NUMBER() assigns unique sequential integers within a result set or group of records starting at 1.

SELECT id, name, credits, department_id, ROW_NUMBER() OVER(PARTITION BY department_id ORDER BY credits DESC) AS rank FROM Course;

In the query above, we create groups of courses, partitioning by department_id, ordering the rows by course credits, and assigning a unique integer. Let’s execute the query and verify the results:

   id    |               name               | credits | department_id | rank
---------+----------------------------------+----------------------------------
 CS101   | Introduction to Computer Science |       4 |             1 |       1
 CS303   | Algorithm Design                 |       4 |             1 |       2
 CS202   | Database Systems                 |       3 |             1 |       3
 CS404   | Software Engineering             |       3 |             1 |       4
 MATH101 | Calculus I                       |       4 |             2 |       1
 MATH202 | Linear Algebra                   |       3 |             2 |       2
 PHYS101 | Physics Fundamentals             |       4 |             3 |       1
 BIOL101 | Introduction to Biology          |       3 |             4 |       1

As we can see in the result, for each group of department_id we’ve assigned unique integers.

2.2. RANK()

RANK() assigns rank to the rows in the result set or group of records. The rank of a row is one plus the number of ranks that come before it. If the row’s value is equal they receive the same rank and the next rank is skipped.

SELECT id, name, credits, department_id, RANK() OVER(PARTITION BY department_id ORDER BY credits DESC) AS rank FROM Course;

In the query above, we create groups by partitioning courses by department_id, ordering the rows by course credits, and ranking the rows based on credit value.

  id    |               name                | credits | department_id | rank
---------+----------------------------------+-----------------------------------
 CS101   | Introduction to Computer Science |       4  |             1 |       1
 CS303   | Algorithm Design                 |       4  |             1 |       1
 CS404   | Software Engineering             |       3  |             1 |       3
 CS202   | Database Systems                 |       3  |             1 |       3
 MATH101 | Calculus I                       |       4  |             2 |       1
 MATH202 | Linear Algebra                   |       3  |             2 |       2
 PHYS101 | Physics Fundamentals             |       4  |             3 |       1
 BIOL101 | Introduction to Biology          |       3  |             4 |       1

As we can see in the result, for each group of department_id we’ve ranked the rows based on credits. Additionally, rank 2 is skipped for department_id 1 since the two courses have the same credits and receive the same rank.

2.3. DENSE_RANK()

DENSE_RANK() is similar to rank except if the two rows have the same value then both of them receive the same rank and no rank is skipped.

SELECT id, name, credits, department_id, DENSE_RANK() OVER(PARTITION BY department_id ORDER BY credits DESC) AS rank FROM Course;

In the query above, we create groups by partitioning courses by department_id, ordering the rows by course credits, and ranking the rows based on credit value.

   id    |               name               | credits | department_id | rank
---------+----------------------------------+----------------------------------
 CS101   | Introduction to Computer Science |       4 |             1 |       1
 CS303   | Algorithm Design                 |       4 |             1 |       1
 CS404   | Software Engineering             |       3 |             1 |       2
 CS202   | Database Systems                 |       3 |             1 |       2
 MATH101 | Calculus I                       |       4 |             2 |       1
 MATH202 | Linear Algebra                   |       3 |             2 |       2
 PHYS101 | Physics Fundamentals             |       4 |             3 |       1
 BIOL101 | Introduction to Biology          |       3 |             4 |       1

As we can see in the result, for each group of department_id we’ve densely ranked the rows based on credits. Unlike RANK() For department_id, we’ve not skipped rank 2 although the initial two rows are ranked 1.

3. Restricting Results to Top N Rows per Group

To define the top n rows, based on the requirement, we can use ROW_NUMBER(), RANK(), or DENSE_RANK(). For the demo purpose we will use the ROW_NUMBER():

WITH RankedCourses AS (
     SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY credits DESC) AS rank
     FROM Course 
)
SELECT id, name, credits, department_id, rank FROM RankedCourses WHERE rank <= 2;

In the query above, we are grouping the result by department_id and retrieving the top two (rank <= 2) results using the WHERE clause. Let’s execute the query and verify the result:

| id      | name                             | credits |  department_id | rank   |
+---------+----------------------------------+-------------------------------------
| CS101   | Introduction to Computer Science |       4 |              1 |       1 |
| CS303   | Algorithm Design                 |       4 |              1 |       2 |
| MATH101 | Calculus I                       |       4 |              2 |       1 |
| MATH202 | Linear Algebra                   |       3 |              2 |       2 |
| PHYS101 | Physics Fundamentals             |       4 |              3 |       1 |
| BIOL101 | Introduction to Biology          |       3 |              4 |       1 |
+---------+----------------------------------+-------------------------------------

As we can see in the result, we’re only retrieving two courses per department_id group.

4. Conclusion

Restricting results to the top n rows per group is a common requirement in data analysis and reporting. By utilizing window functions like ROW_NUMBER(), RANK(), or DENSE_RANK() along with CTE or subqueries, we can partition and rank our data, ensuring that we only retrieve the most relevant row for each group. This technique is consistent across all three major databases: MySQL, PostgreSQL, and SQL Server.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Full Archive

About Baeldung