Understanding the SQL OVER() Clause Baeldung on SQL

1. Overview

We can use window functions in SQL to perform calculations across rows related to the current row. Unlike standard aggregate functions like SUM() or AVG() which group data and obscure individual details, window functions retain the context of each row while also performing complex calculations.

In this tutorial, we’ll look at the SQL OVER() clause using various examples. We’ll leverage the Baeldung University Database schema and look at syntax for SQL Server, MySQL, and PostgreSQL.

2. SQL OVER() Clause

We can use the OVER() clause to define a window of rows for performing calculations. It allows us to partition and order rows, enabling targeted calculations on specific subsets of data.

An OVER() clause helps compute running totals, rankings, and lead/lag values. Moreover, we can apply functions like SUM(), AVG(), RANK(), or LEAD() without altering the overall structure of our query.

2.1. Syntax

The basic syntax for using the OVER() clause in SQL is as follows:

SELECT column_name, <aggregate_or_analytic_function>
OVER ([PARTITION BY column] [ORDER BY column] [ROWS or RANGE clause]) AS result
FROM table_name

Let’s explore the parts of the query:

aggregate_or_analytic function refers to the function we want to apply within the window, such as SUM(), AVG(), and others
PARTITION BY divides the data into partitions that allow the window function to operate within each group
ORDER BY orders the rows within each partition that are essential for functions that require a specific sequence
ROWS or RANGE clauses define the exact set of rows to be used in the window frame for the calculation

Furthermore, we can combine multiple window functions in a query to handle more complex calculations.

3. PARTITION BY

We can use the PARTITION BY clause to split an obtained data set into smaller groups according to multiple columns.

Additionally, when we combine PARTITION BY with window functions like RANK(), LAG(), SUM(), or COUNT(), these functions calculate values separately within each partition.

Let’s count the total positions that each faculty member holds in the Faculty table of our University Database:

SELECT name, position,
COUNT(*) OVER (PARTITION BY name) AS total_positions
FROM Faculty;

In this query, we group the faculty members by name with the PARTITION BY clause. Then, we COUNT the rows by each name to sum the total number of positions held by each faculty member.

3.1. ORDER BY

We can use the ORDER BY clause in SQL to sort the rows within each partition. Moreover, it’s useful for functions that depend on the sequence, such as calculating running totals.

Let’s apply the PARTITION BY and ORDER BY clauses to the Exam table and calculate the highest and lowest grades for each student:

SELECT id, semester, grade,
MAX(grade) OVER (PARTITION BY student_id ORDER BY grade) AS course_highest,
MIN(grade) OVER (PARTITION BY student_id ORDER BY grade) AS course_lowest
FROM Exam
ORDER BY course_id;

In this query, PARTITION BY student_id groups the results by each student, so the calculations are done separately for each one. The ORDER BY grade also sorts the grades within each student’s group.

By ordering the grades within each student’s group, we ensure that the MAX() function always finds the highest grade that appears last in the sorted data. Similarly, the MIN() function always finds the lowest grade, which appears first in the sorted data.

3.2. ROWS/RANGE Clauses

We can define the ROWS or RANGE clauses with the ORDER BY clause. It specifies which rows relative to the current row should be included in a calculation.

For instance, if we want to include the present row, the row before it, and the row after it, we can use this ROWS clause:

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

Alternatively, if we want to include all rows from the start of the partition up to the current row, we can use the UNBOUNDED PRECEDING clause.

However, if we don’t specify ROWS or RANGE clause then the default value is set to the following clause:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Let’s consider an example using the Student table to demonstrate window functions with the ORDER BY and ROWS clauses:

SELECT id, enrollment_date,
AVG(gpa) OVER (ORDER BY enrollment_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS Avg3GPAs,
AVG(gpa) OVER (ORDER BY YEAR(enrollment_date) RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS AvgGPAYearOverYear
FROM Student
ORDER BY enrollment_date, id;

Here, we use the AVG() function with the ROWS clause to calculate the average GPA of each student and the two preceding students, ordered by enrollment date. Additionally, we use the RANGE clause to determine the average GPA year over year, considering all preceding years up to the current year.

4. GROUP BY

Unlike the OVER() clause, the GROUP BY doesn’t retain individual row details in its result set. We can use GROUP BY to reduce the number of rows and create a summarize the data.

In addition, GROUP BY aggregates data by grouping rows based on specified columns, allowing us to apply aggregate functions to each group.

Let’s find the highest and lowest GPA for students based on their enrollment date:

SELECT enrollment_date,
MAX(gpa) AS highest_gpa,
MIN(gpa) AS lowest_gpa
FROM Student
GROUP BY enrollment_date;

Here, we use the GROUP BY function to summarize data into a single row per group but don’t retain row-specific details.

5. GROUP BY vs. PARTITION BY

We often use GROUP BY to combine rows with the same values in particular columns into a single group. For example, we used it for aggregation to summarize data by calculating values like sums, averages, or counts for each group.

On the other hand, we use PARTITION BY to partition an output set into separate sections based on specified columns. This allows us to perform calculations within each partition while retaining the original row details.

Furthermore, GROUP BY returns a single row per group containing aggregated values for the specified columns. However, PARTITION BY returns a row for each original row, with additional columns containing aggregated values calculated within each partition.

6. Conclusion

In this article, we explored the OVER() clause and its different parameters such as PARTITION BY and ORDER BY.

Furthermore, we can use the ROWS or RANGE clauses with ORDER BY to specify which rows should be included in a calculation relative to the current row.

Finally, we looked at GROUP BY, and how it is used for similar but different problems from the OVER() clause.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Full Archive

About Baeldung