Calculating Running Totals in SQL Baeldung on SQL

1. Introduction

Running totals, also known as cumulative sums, are a common requirement in data analysis and reporting in SQL. A running total is the sum of a sequence of numbers updated each time a new number is added. In SQL, calculating running totals can be done in several ways, depending on the SQL version and the dataset size.

In this article, we’ll cover several ways to calculate running total in SQL. To begin with, we’ll explore SQL window functions. After that, we’ll discuss subqueries. Lastly, we’ll see how to utilize joins to calculate running totals in SQL.

Notably, we execute the queries on MySQL and MSSQL server database management systems using tables in the Baeldung University database schema.

2. Sample Column

For demonstration purposes, let’s create a sample column called scores in the Exam table. This column will store numeric grades with a high level of precision to accurately calculate running totals:

ALTER TABLE Exam
ADD COLUMN scores DECIMAL(10, 3);

Next, we input some dummy data into the scores column to check the query results. This data represents the grades for different exams, which we’ll use to calculate the running total:

UPDATE Exam SET scores = 87.456 WHERE id = 1;
UPDATE Exam SET scores = 92.678 WHERE id = 2;
...
UPDATE Exam SET scores = 88.234 WHERE id = 4;
UPDATE Exam SET scores = 91.567 WHERE id = 5;
…
UPDATE Exam SET scores = 86.789 WHERE id = 10;
UPDATE Exam SET scores = 90.234 WHERE id = 11;
UPDATE Exam SET scores = 88.123 WHERE id = 12;

Before proceeding with the running total calculation, it’s a good practice to verify that we’ve inserted the data correctly. We can use a SELECT statement to check the data:

SELECT * FROM Exam;

This query displays all rows in the Exam table, allowing us to confirm that the scores column has been updated with the correct values.

3. Using the OVER Clause or Window Functions

The OVER clause, often used with window functions, enables calculations across a set of table rows that are related to the current row. This method is supported in SQL Server 2012 and later versions, as well as in other databases like MySQL and PostgreSQL.

To calculate the running total of scores in the Exam table, we use the window function:

SELECT id, scores, SUM(scores)
OVER (ORDER BY id) AS RunningTotal
FROM Exam;

which gives us the following:

id scores RunningTotal 
1  87.456  87.456 
2  92.678  180.134 
3  75.213  255.347

What happened here? Let’s take a look:

SELECT id, scores selects the id and scores columns from the Exam table
The SUM(scores) function calculates the cumulative sum of the scores column for each row.
OVER (ORDER BY id) clause specifies that the cumulative sum should be calculated in the order of the id column
AS RunningTotal renames this cumulative sum column as RunningTotal for clarity in the result set

Thus, the window function computes a running total for each row, starting from the first id and continuing sequentially through the table.

4. Using Subqueries

Subqueries offer a way to calculate running totals in databases that do not support window functions. While this method is versatile and works in all SQL versions, it can be less efficient for large datasets.

To elaborate, the subquery must execute multiple times, once for each row in the outer query, leading to increased computational overhead.

To understand this approach better, we calculate the running total of scores in the Exam table:

SELECT id, scores,
    (SELECT SUM(scores)
     FROM Exam e2
     WHERE e2.id <= e1.id) AS RunningTotal
FROM Exam e1
ORDER BY id;

which produces the following output:

id scores RunningTotal
1 87.456 87.456
2 92.678 180.134
3 75.213 255.347

In this query, the main SELECT statement retrieves the id and scores from the Exam table, aliasing it as e1. Meanwhile, the subquery calculates the running total for each row. This subquery works by selecting the sum of the scores from another instance of the Exam table, aliased as e2.

The crux is the WHERE clause within the subquery that ensures it only sums the scores values where the id is less than or equal to the current row’s id from the outer query. Consequently, the subquery computes the cumulative sum of scores up to the current row, effectively providing the running total.

Lastly, the ORDER BY statement then sorts the output by id and displays results in ascending order.

While correlated subqueries are powerful and versatile, they can lead to performance issues because each row’s calculation involves a full table scan up to that point in the dataset. This can significantly impact performance, especially as the size of the dataset grows.

5. Using Self-Joins

Self-joins provide another method for calculating running totals. This approach is similar to using correlated subqueries but can be slightly more efficient due to its use of set-based operations.

Self-joins provide another method for calculating running totals. This approach is similar to using correlated subqueries but can be more efficient due to its use of set-based operations.

To calculate the running total of scores in the Exam table, we use a self-join:

SELECT s1.id, s1.scores, SUM(s2.scores)
AS RunningTotal
FROM Exam s1
JOIN Exam s2 ON s1.id >= s2.id
GROUP BY s1.id, s1.scores
ORDER BY s1.id;

resulting in the output below:

id scores RunningTotal 
1  87.456  87.456 
2  92.678  180.134 
3  75.213  255.347

First, this query uses a self-join to pair each row in the Exam table s1 with all preceding rows s2. This step is crucial because it matches each row with all earlier or equivalent id rows, which is necessary for calculating the running total.

Then, the SUM function calculates the cumulative sum of the scores column from the joined rows. This running total is calculated for each row in s1 based on the values matched from s2.

After that, the GROUP BY clause groups the results by id and scores. This grouping is necessary to ensure that the command computes the sum correctly for each date and amount combination. Finally, the query sorts the output by id.

6. Conclusion

In this article, we covered several methods for calculating running totals in SQL. We began with window functions like the OVER clause. Next, we examined correlated subqueries, which are versatile and compatible with all SQL versions, though they may be less efficient for large datasets. Finally, we discussed using self-joins for the recursive summing of values.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Full Archive

About Baeldung