Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

While working with databases, we often need to calculate totals for specific groups of data. A common task is summing values in a column while grouping the results by a particular identifier. Fortunately, most SQL implementations provide aggregate functions to help us compute these sums across several rows.

In this tutorial, we’ll learn how to use the SUM function with the GROUP BY clause to find totals for each group.

The examples in this article are based on the Bealdung sample schema and work in MySQL, PostgreSQL, and SQL Server.

2. Aggregation Functions

Aggregation functions work on a set of values and give back a single result. We often use them with the GROUP BY clause to do calculations on groups of rows. Here are some of the most common aggregation functions:

  • SUM: adds up all the non-NULL values in a column
  • AVG: calculates the average (mean) of all non-NULL values in a column
  • COUNT: counts the number of rows or non-NULL values in a column
  • MIN: finds the smallest value in a column
  • MAX: finds the largest value in a column

3. Grouping Data

We use the GROUP BY clause to split the rows in a table into groups based on the values in one or more columns. Using it with an aggregation function like SUM lets us calculate subtotals for each group.

Let’s look at the basic syntax for using GROUP BY with SUM:

SELECT column1, SUM(column2) AS total
FROM Table
GROUP BY column1;

This query groups the rows in Table by the values in column1. Then, it adds up column2 for each group. Finally, the results are returned with the grouped values and their totals.

For example, we can find the total number of credits each department offers in the University database:

SELECT department_id, SUM(credits) AS dept_total_credits
FROM Course
GROUP BY department_id;
+---------------+--------------------+
| department_id | dept_total_credits |
+---------------+--------------------+
|             1 |                195 |
|             2 |                 78 |
|             3 |                 71 |
|             4 |                 83 |
|             5 |                 90 |
+---------------+--------------------+
5 rows in set (0.00 sec)

As a result, this query groups the rows in the Course table by the department_id column, sums the credits column for each group, and returns the department_id and total credits for each department.

4. Filtering Grouped Results

Sometimes, we may want to filter the groups based on the result of the aggregation. We’ll use the HAVING clause for this.

Notably, we use the HAVING clause to specify a condition that filters the groups after we’ve created them. It works like the WHERE clause but operates on the grouped results instead of individual rows.

The basic syntax for using HAVING with GROUP BY and SUM is:

SELECT column1, SUM(column2) AS total
FROM table
GROUP BY column1
HAVING condition;

The condition in the HAVING clause can refer to the result of the aggregation (the alias we define in the SELECT clause) or any of the columns in the GROUP BY clause.

Moreover, it’s important to understand the difference between the WHERE and HAVING clauses:

  • WHERE filters individual rows before the grouping happens; therefore, it can’t refer to the result of the aggregation or aliases defined in the SELECT clause
  • HAVING filters the grouped results after the aggregation has been done meaning that it can refer to the result of the aggregation and aliases defined in the SELECT clause

Let’s modify our first query to find departments that offer more than 80 credits in total with a HAVING clause:

SELECT department_id, SUM(credits) AS dept_total_credits
FROM Course
GROUP BY department_id
HAVING SUM(credits) > 80;
+---------------+--------------------+
| department_id | dept_total_credits |
+---------------+--------------------+
|             1 |                195 |
|             4 |                 83 |
|             5 |                 90 |
+---------------+--------------------+
3 rows in set (0.00 sec)

The query first groups the rows by department_id. Then, it sums the credits for each department. Finally, it filters the results to only include departments with a sum of credits over 80.

5. Joining Tables for Complex Aggregations

Sometimes, we may need to join multiple tables for complex aggregations. This lets us combine data from different tables and calculate totals based on related information.

Joining tables is necessary when the data we want to aggregate is spread across multiple tables. Doing so brings together the relevant information.

There are several types of joins available in SQL; let’s look at the most commonly used ones:

The choice of join type depends on what we need from the query and how the tables are related.

For example, suppose we want to calculate the total number of credits each student has taken. This information is not available in a single table, so we need to join the Student, Registration, and Course tables:

SELECT s.id, s.name, SUM(c.credits) AS student_credits
FROM Student s
LEFT JOIN Registration r ON s.id = r.student_id
LEFT JOIN Course c ON r.course_id = c.id
GROUP BY s.id, s.name;
+------+-----------------+-----------------+
| id   | name            | student_credits |
+------+-----------------+-----------------+
| 1001 | John Liu        |              83 |
| 1003 | Rita Ora        |              80 |
| 1007 | Philip Lose     |              80 |
| 1010 | Samantha Prabhu |              78 |
| 1011 | Vikas Jain      |            NULL |
| 1101 | Jia Grey        |            NULL |
| 1103 | Rose Rit        |              85 |
| 1107 | Phellum Luis    |              80 |
| 1110 | Albert Decosta  |            NULL |
| 1111 | Vikram Kohli    |            NULL |
| 1607 | Peter Liu       |              82 |
| 1610 | Ritu Raj        |              78 |
| 1617 | Philip Mohan    |            NULL |
| 1619 | Sam Roberts     |            NULL |
| 1621 | Vineet Jha      |            NULL |
| 1707 | Piu Liu         |              80 |
| 1710 | Roni Roto       |              87 |
| 1717 | Param Mohan     |            NULL |
| 1719 | Siren Lobo      |            NULL |
| 1721 | Vini Puh        |            NULL |
| 2001 | Reena Roy       |            NULL |
| 2006 | Agatha Christi  |            NULL |
| 2008 | Julia Roberts   |              80 |
| 2009 | Pollards Grey   |              78 |
| 2017 | Potu Singh      |            NULL |
+------+-----------------+-----------------+
25 rows in set (0.00 sec)

Now, let’s break down this query:

  1. The query starts with the Student table which contains the student information.
  2. It does a LEFT JOIN with the Registration table based on the student ID making sure we include all students in the result even if they haven’t registered for any courses.
  3. It then does another LEFT JOIN with the Course table based on the course ID from the Registration table therefore bringing in the credit information for each course.
  4. The results are grouped by the student ID and name using the GROUP BY clause.
  5. It uses the SUM function to calculate the total credits for each student.

This query gives a clear overview of the total credits taken by each student, including students who have not registered for any courses.

6. Handling NULL Values

In our previous example, we might have noticed that some students had NULL for their total credits. This occurs when a student hasn’t registered for any courses yet. While this accurately reflects the data, it might not always be the most useful way to represent the information in our reports or analyses. NULL values can affect the results of functions like SUM so we may need to handle them differently.

If we want to include NULL values in the aggregation and treat them as a specific value, say zero, we can use the COALESCE function.

The syntax for using COALESCE with SUM is:

SELECT SUM(COALESCE(column, replacement_value)) AS total
FROM table;

Here, column is the column we want to sum and replacement_value is the value we want to use instead of NULL.

Now, let’s revisit the example of summing credits for each student. However, this time, we treat NULL values as zero:

SELECT s.id, s.name, SUM(COALESCE(c.credits, 0)) AS student_credits
FROM Student s
LEFT JOIN Registration r ON s.id = r.student_id
LEFT JOIN Course c ON r.course_id = c.id
GROUP BY s.id, s.name;
+------+-----------------+-----------------+
| id   | name            | student_credits |
+------+-----------------+-----------------+
| 1001 | John Liu        |              83 |
| 1003 | Rita Ora        |              80 |
| 1007 | Philip Lose     |              80 |
| 1010 | Samantha Prabhu |              78 |
| 1011 | Vikas Jain      |               0 |
| 1101 | Jia Grey        |               0 |
| 1103 | Rose Rit        |              85 |
| 1107 | Phellum Luis    |              80 |
| 1110 | Albert Decosta  |               0 |
| 1111 | Vikram Kohli    |               0 |
| 1607 | Peter Liu       |              82 |
| 1610 | Ritu Raj        |              78 |
| 1617 | Philip Mohan    |               0 |
| 1619 | Sam Roberts     |               0 |
| 1621 | Vineet Jha      |               0 |
| 1707 | Piu Liu         |              80 |
| 1710 | Roni Roto       |              87 |
| 1717 | Param Mohan     |               0 |
| 1719 | Siren Lobo      |               0 |
| 1721 | Vini Puh        |               0 |
| 2001 | Reena Roy       |               0 |
| 2006 | Agatha Christi  |               0 |
| 2008 | Julia Roberts   |              80 |
| 2009 | Pollards Grey   |              78 |
| 2017 | Potu Singh      |               0 |
+------+-----------------+-----------------+
25 rows in set (0.00 sec)

In this query, we use the COALESCE function to replace any NULL values in the credits column with zero. Then, we apply the SUM function to calculate the total credits for each student.

The result includes all students even if they have a NULL number of credits registered. However, the NULL values were treated as zero in the calculation. This allows our query to clearly show us when a student hasn’t registered for any credits yet.

7. Conclusion

In this article, we learned about the SUM function in SQL and how it can be used with the GROUP BY clause to calculate totals for groups of rows. We also covered advanced use cases like filtering grouped results with the HAVING clause, joining tables for complex aggregations, and handling NULL values.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.