Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

The GROUP BY clause in SQL is a powerful tool for aggregating data based on one or more columns. In particular, the GROUP BY clause is useful when analyzing data across multiple dimensions. For example, we might want to summarize sales data by product and region or group student enrollment by program and department.

Understanding how to use GROUP BY effectively enhances the ability to perform complex data analysis and generate insightful reports.

In this tutorial, we’ll explore the usage of GROUP BY in multiple tables, utilizing the Baeldung University database. Specifically, we’ll work with the Program and Department tables to illustrate the concept. We’ll start with the basic usage of GROUP BY. Then, we’ll move on to grouping by multiple columns, incorporating the HAVING clause, and combining GROUP BY with other SQL clauses.

2. Basic Usage of GROUP BY

It’s important to understand the basic usage of the GROUP BY clause before we dive into grouping multiple columns. Let’s start with a simple example using the Program table. Suppose we want to count the number of programs offered by each department. We can achieve this using the GROUP BY clause on the department_id column:

SELECT department_id, COUNT(*) AS program_count
FROM Program
GROUP BY department_id;
+---------------+---------------+
| department_id | program_count |
|---------------+---------------|
| 4             | 4             |
| 2             | 4             |
| 3             | 4             |
| 1             | 8             |
| 5             | 4             |
+---------------+---------------+
SELECT 5
Time: 0.014s

The query groups the rows in the Program table by department_id and counts the number of programs for each department.

3. Group by Multiple Columns

Grouping by multiple columns allows for more granular aggregation, taking into account multiple dimensions of the data. To illustrate this, let’s consider a scenario where we want to count the number of programs by both department and program type:

SELECT department_id, type, COUNT(*) AS program_count
FROM Program
GROUP BY department_id, type;
+---------------+-------+---------------+
| department_id | type  | program_count |
|---------------+-------+---------------|
| 2             | Minor | 2             |
| 4             | Minor | 2             |
| 5             | Minor | 2             |
| 2             | Major | 2             |
| 4             | Major | 2             |
...

The query groups the rows in the Program table by both department_id and type, providing a count of programs for each combination of department and program type.

4. Using HAVING with GROUP BY on Multiple Columns

We can use the HAVING clause to filter groups based on aggregate values after grouping by multiple columns. In particular, it allows for more complex conditions on the aggregate results.

Let’s extend the previous example where we grouped by both department_id and type. Suppose we want to find the department and program type combinations that offer more than one program. We achieve this using the HAVING clause in conjunction with GROUP BY on multiple columns:

SELECT department_id, type, COUNT(*) AS program_count
FROM Program
GROUP BY department_id, type
HAVING COUNT(*) > 1;
+---------------+-------+---------------+
| department_id | type  | program_count |
|---------------+-------+---------------|
| 2             | Minor | 2             |
| 4             | Minor | 2             |
| 5             | Minor | 2             |
| 2             | Major | 2             |
| 4             | Major | 2             |
...

The query groups the rows in the Program table by both department_id and type and counts the number of programs for each combination. Then, it filters the results to include only those combinations with more than one program.

5. Using Joins with GROUP BY on Multiple Columns

Combining GROUP BY and joins can be very powerful if we need to aggregate data from multiple related tables. Let’s demonstrate how to join the Program table with the Department table and group the results by multiple columns.

For example, let’s find the number of programs offered by each department, grouped by department name and program type:

SELECT d.name AS department_name, p.type, COUNT(*) AS program_count
FROM Program p
JOIN Department d ON p.department_id = d.id
GROUP BY d.name, p.type;
+--------------------------------+-------+---------------+
| department_name                | type  | program_count |
|--------------------------------+-------+---------------|
| Mechanical Engineering         | Major | 3             |
| Electronics and Communications | Major | 2             |
| Civil Engineering              | Major | 2             |
| Civil Engineering              | Minor | 2             |
...

In the output, each row represents a combination of department name and program type, with the count of programs in each combination. Here, the query joins the Program and Department tables on the department_id and id columns, respectively. Then, it groups the results by the department name and program type and counts the number of programs in each group.

6. Using ORDER BY with GROUP BY on Multiple Columns

We can use the ORDER BY clause in conjunction with GROUP BY to sort grouping results based on one or more columns.

For example, let’s extend the previous example and sort the grouped result by department name and then by program count in descending order:

SELECT d.name AS department_name, p.type, COUNT(*) AS program_count
FROM Program p
JOIN Department d ON p.department_id = d.id
GROUP BY d.name, p.type 
ORDER BY d.name DESC, program_count DESC;
+--------------------------------+-------+---------------+
| department_name                | type  | program_count |
|--------------------------------+-------+---------------|
| Mechanical Engineering         | Major | 3             |
| Mechanical Engineering         | Minor | 1             |
| Mathematics                    | Major | 2             |
| Mathematics                    | Minor | 2             |
...

Here, the query groups the data by department name and program type while counting the number of programs in each group. Then, it sorts the result, first by department name and then by program count in descending order.

7. Using Nested GROUP BY with Multiple Columns

Sometimes, we might need to perform multiple levels of grouping to get more detailed insight. This is where a nested GROUP BY clause can be useful.

For example, let’s first group the programs by department name, program type, and start date, and then within each group, count the number of programs and find the earliest start date:

SELECT department_name, type, program_count, MIN(start_date) AS earliest_start_date
FROM (
     SELECT d.name AS department_name, p.type, COUNT(*) AS program_count, p.start_date
     FROM Program p
     JOIN Department d ON p.department_id = d.id
     GROUP BY d.name, p.type, p.start_date
) AS subquery
GROUP BY department_name, type, program_count;
+--------------------------------+-------+---------------+---------------------+
| department_name                | type  | program_count | earliest_start_date |
|--------------------------------+-------+---------------+---------------------|
...
| Mechanical Engineering         | Minor | 1             | 2010-07-01          |
| Mathematics                    | Minor | 1             | 2010-07-11          |
| Mechanical Engineering         | Major | 3             | 2010-07-01          |
...

Here, we use a subquery to first group the programs by department name, program type, and start date. Then, we count the number of programs in each group. Additionally, the outer query further groups these results by department name, program type, and program count, and finds the earliest start date within each group.

8. Conclusion

In this article, we’ve explored various ways to use the GROUP BY clause with multiple columns. We looked at basic usage to more advanced techniques involving joins, ordering, and nested groupings. By mastering these techniques, we can perform more complex and insightful data analysis.