Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

When we work with SQL queries that involve ranking rows within partitions or assigning unique row numbers, we mostly use the RANK() and ROW_NUMBER() window functions. At first glance, they may seem to produce identical result sets, especially when the data doesn’t contain duplicates on the ordering column. However, they serve different purposes and behave differently in certain scenarios.

In this tutorial, we’ll compare SQL RANK() and ROW_NUMBER() functions. First, we’ll walk through the ROW_NUMBER() definition and syntax. After that, we’ll cover the same for RANK(). Lastly, we’ll discuss the behavior of both functions with duplicates.

We’ll use the Baeldung University database schema for the examples in this tutorial.

2. ROW_NUMBER()

The ROW_NUMBER() function assigns a unique, sequential integer to rows within a partition of a result set. Each row is given a number based on the order specified, and no two rows ever share the same ROW_NUMBER() value, even if there are ties in the ORDER BY clause.

For instance, let’s look at the syntax of ROW_NUMBER():

ROW_NUMBER() OVER(PARTITION BY column ORDER BY column);

Let’s look at the query description:

  • PARTITION BY divides the result set into groups
  • ORDER BY specifies the order in which the row numbers are assigned within each partition

For example, if we want to number student registrations based on the semester, we can use ROW_NUMBER() for this partition on the Registration table:

SELECT student_id, course_id, semester, year,
    ROW_NUMBER() OVER (PARTITION BY student_id 
    ORDER BY year, semester, course_id) AS RowNum
FROM Registration;

When we execute the query, we get the output with row numbers:

+------------+-----------+----------+------+--------+
| student_id | course_id | semester | year | RowNum |
+------------+-----------+----------+------+--------+
|       1001 | CS211     | FALL     | 2022 |      1 |
|       1001 | CS212     | FALL     | 2022 |      2 |
|       1001 | CS531     | FALL     | 2022 |      3 |
|       1001 | MA111     | FALL     | 2022 |      4 |
|       1001 | CS111     | SPRING   | 2022 |      5 |
|       1001 | CS121     | SPRING   | 2022 |      6 |
|       1001 | CS122     | SPRING   | 2022 |      7 |
|       1001 | CS121     | FALL     | 2023 |      8 |
|       1001 | CS122     | FALL     | 2023 |      9 |
|       1001 | MA411     | FALL     | 2023 |     10 |
...

From the above output, we can see that even if the semester and year values are the same, each course gets a unique row number due to the ordering by course_id.

3. RANK()

The RANK() function assigns a rank to each row within a partition of a result set. When two rows share the same value in the ordering column, SQL assigns them the same rank and skips the next ranks accordingly.

Here’s the syntax for the RANK() function:

RANK() OVER(PARTITION BY column ORDER BY column);

Let’s break down the command:

  • PARTITION BY means optional grouping
  • ORDER BY determines how ranks are assigned

For instance, to rank students based on GPA for a report, we can use the RANK() function to assign ranks in the Student table:

SELECT id, name, gpa,
    RANK() OVER (ORDER BY gpa DESC) AS Rank
FROM Student;

Let’s execute the query and check the output:

+------+-----------------+------+------+
| id   | name            | gpa  | Rank |
+------+-----------------+------+------+
| 1617 | Philip Mohan    |    5 |    1 |
| 1010 | Samantha Prabhu |  4.9 |    2 |
| 2009 | Pollards Grey   | 4.65 |    3 |
| 2006 | Agatha Christi  | 4.59 |    4 |
| 1710 | Roni Roto       | 4.44 |    5 |
| 1619 | Sam Roberts     |  4.3 |    6 |
| 1107 | Phellum Luis    | 4.21 |    7 |
| 1003 | Rita Ora        |  4.2 |    8 |
| 2001 | Reena Roy       | 4.15 |    9 |
| 1001 | John Liu        |    4 |   10 |
| 1110 | Albert Decosta  |    4 |   10 |
| 1101 | Jia Grey        | 3.98 |   12 |
...

Based on the output, Philip Mohan has the highest GPA with rank 1. Samantha Prabhu follows with a GPA of 4.9 and ranks second. There are no ties until we reach Albert Decosta and John Liu, both with a GPA of 4.0. They share rank 10. As a result, the next student, Jia Grey, with a GPA of 3.98, is ranked 12, skipping rank 11 due to the tie above.

4. Differences Between RANK() and ROW_NUMBER()

Both RANK() and ROW_NUMBER() are window functions in SQL used for ordering rows, but they serve different purposes. ROW_NUMBER() assigns a unique sequential number to each row within a partition, regardless of ties. On the other hand, RANK() assigns the same rank to rows with equal values, and skips the next numbers accordingly. This distinction is important in scenarios involving ties — RANK() highlights them, while ROW_NUMBER() doesn’t.

To see this in action, let’s run a query that uses both functions side-by-side. We’ll rank students based on their GPA:

SELECT id, name, gpa,
    RANK() OVER (ORDER BY gpa DESC) AS Rank,
    ROW_NUMBER() OVER (ORDER BY gpa DESC) AS RowNum
FROM Student;

Let’s execute the query and check the output:

+------+-----------------+------+------+--------+
| id   | name            | gpa  | Rank | RowNum |
+------+-----------------+------+------+--------+
| 1617 | Philip Mohan    |    5 |    1 |      1 |
| 1010 | Samantha Prabhu |  4.9 |    2 |      2 |
| 2009 | Pollards Grey   | 4.65 |    3 |      3 |
| 2006 | Agatha Christi  | 4.59 |    4 |      4 |
| 1710 | Roni Roto       | 4.44 |    5 |      5 |
| 1619 | Sam Roberts     |  4.3 |    6 |      6 |
| 1107 | Phellum Luis    | 4.21 |    7 |      7 |
| 1003 | Rita Ora        |  4.2 |    8 |      8 |
| 2001 | Reena Roy       | 4.15 |    9 |      9 |
| 1001 | John Liu        |    4 |   10 |     10 |
| 1110 | Albert Decosta  |    4 |   10 |     11 |
| 1101 | Jia Grey        | 3.98 |   12 |     12 |
...

Here, we see that RANK () and ROW_NUMBER () yield the same output until we reach student ID 1001, where RANK() repeats the previous number due to a tie, while ROW_NUMBER() assigns a unique number.

However, both functions yield identical results when the ORDER BY column contains unique values, such as a primary key.

5. Use Cases and Best Practices

To understand when to use RANK() and when to use ROW_NUMBER(), it’s helpful to consider some practical use cases. First, let’s consider a leaderboard where multiple players might have the same score. In that case, we should use RANK(). It assigns the same rank to tied scores and skips the next numbers to keep the ranking fair.

On the other hand, if we are developing a pagination system, for instance, retrieving rows 11 through 20 from a larger dataset, we must use ROW_NUMBER().  ROW_NUMBER() always assigns a unique row index. This makes it easy to filter results using row number ranges.

Another common case involves data deduplication. Let’s suppose we have duplicate entries in a table and we only want to retain the first instance based on some criteria, like the latest timestamp. Here, ROW_NUMBER() combined with a CTE, Common Table Expression, helps isolate and delete duplicates efficiently.

6. Conclusion

In this article, we compared the SQL RANK() and ROW_NUMBER() functions.

Although RANK() and ROW_NUMBER() often seem to behave similarly, their differences are crucial in real-world applications. ROW_NUMBER() always gives a unique row identifier, making it great for pagination or duplicate removal. On the other hand, RANK() works best when the logic treats rows with the same values as equally important, such as in leaderboards or top-N scenarios.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.