Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

When working with SQL queries that involve grouped data and using aggregate functions such as SUM(), AVG(), or COUNT(), we may encounter the error: Column X is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. This error commonly occurs in SQL Server and arises when we select a column that’s not aggregated and listed in the GROUP BY clause.

In this tutorial, we’ll explore what causes this error and the best way to resolve it.

2. Sample Data

For the examples, we use the Student table from the Baeldung University schema to illustrate what causes the error and how to fix it:

SELECT * FROM Student;
+------+-----------------+-------------+------------+-----------------+-----------------+------+
| id   | name            | national_id | birth_date | enrollment_date | graduation_date | gpa  |
+------+-----------------+-------------+------------+-----------------+-----------------+------+
| 1001 | John Liu        |   123345566 | 2001-04-05 | 2020-01-15      | 2024-06-15      | 4.00 |
| 1003 | Rita Ora        |   132345166 | 2001-01-14 | 2020-01-15      | 2024-06-15      | 4.20 |
| 1007 | Philip Lose     |   321345566 | 2001-06-15 | 2020-01-15      | 2024-06-15      | 3.80 |
| 1010 | Samantha Prabhu |  3217165566 | 2001-03-21 | 2020-01-15      | 2024-06-15      | 4.90 |
| 1011 | Vikas Jain      |   321345662 | 2001-07-18 | 2020-01-15      | NULL            | 3.30 |
| 1101 | Jia Grey        |  1345236267 | 2001-02-05 | 2020-01-15      | 2024-06-15      | 3.98 |
| 1103 | Rose Rit        |  1323612067 | 2001-05-14 | 2020-01-15      | NULL            | 3.57 |
...

The table contains information such as name, birth date, graduation date, and others for each student.

3. Understanding the Error

The error in question occurs when we try to include non-aggregated columns in the SELECT clause that aren’t part of the GROUP BY clause.

To demonstrate, let’s group students by enrollment date and get the average GPA per enrollment year:

SELECT enrollment_date, name, AVG(gpa) AS average_gpa
FROM Student 
GROUP BY enrollment_date;

However, when we run the query in SQL Server, we get an error:

Column 'Student.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

The error occurs because SQL Server encounters multiple name values for each enrollment_date and can’t determine which one to return unless it’s grouped or aggregated.

To explain, when we write GROUP BY enrollment_date, we’re telling SQL Server to group all rows that share the same enrollment_date, and then perform aggregations per group.

Now, within each group, there could be multiple names. Since the name is neither aggregated nor listed in the GROUP BY clause, SQL Server raises an error to prevent returning inconsistent data.

Once we use a GROUP BY, the database expects the result to be a summary per group. This means every column in SELECT must either help define the group or be an aggregated value.

Unlike SQL Server, MySQL allows such queries if ONLY_FULL_GROUP_BY is disabled.

4. How to Handle the Error

In this section, we discuss multiple approaches to manage or fix the error.

4.1. Remove the Unaggregated Column

As expected, we can just omit the offending columns.

In the example above, we can remove the name column from the SELECT clause if we don’t want to include it in the GROUP BY clause or apply an aggregate function to it:

SELECT enrollment_date, AVG(gpa) AS average_gpa
FROM Student
GROUP BY enrollment_date;

This query displays a single row for each enrollment_date, with the average GPA of all students who enrolled on that date:

+-----------------+-------------+
| enrollment_date | average_gpa |
+-----------------+-------------+
| 2020-01-15      |    3.923000 |
| 2021-01-15      |    3.717143 |
| 2022-01-15      |    4.107500 |
+-----------------+-------------+

We can use this approach when we don’t need individual values like names or IDs and only care about summary values such as averages, totals, or counts.

4.2. Use an Aggregate Function

Suppose the goal is to summarize data, such as calculating the average GPA, maximum score, or total count. In that case, we must ensure that every column in the SELECT list is either part of the GROUP BY clause or wrapped in an aggregation function.

For instance, let’s say we want to see the average GPA for each enrollment_date, along with one example student from that date. To extract this information, we apply an aggregate function on the name column:

SELECT enrollment_date, MIN(name) AS example_student, AVG(gpa) AS average_gpa
FROM Student
GROUP BY enrollment_date;

In the above query, we use MIN(name) to return the alphabetically first one in each group. AVG(gpa) then computes the average GPA of all students who enrolled on the same date:

+-----------------+-----------------+-------------+
| enrollment_date | example_student | average_gpa |
+-----------------+-----------------+-------------+
| 2020-01-15      | Albert Decosta  |    3.923000 |
| 2021-01-15      | Param Mohan     |    3.717143 |
| 2022-01-15      | Agatha Christi  |    4.107500 |
+-----------------+-----------------+-------------+

Now, every column in the SELECT list is either part of the GROUP BY clause or is aggregated.

4.3. Include the Column in the GROUP BY Clause

To fix an unaggregated column error, we can include the offending column in the GROUP BY clause.

In the current example, that’s the name column. Let’s say we want the average GPA for each enrollment year per student. We can pass the name and enrollment_date columns in the GROUP BY clause:

SELECT name, enrollment_date, AVG(gpa) AS avg_gpa
FROM Student
GROUP BY name, enrollment_date;

This query groups data by both name and enrollment_date and then calculates the average GPA for each student per enrollment period:

+-----------------+-----------------+----------+
| name            | enrollment_date | avg_gpa  |
+-----------------+-----------------+----------+
| John Liu        | 2020-01-15      | 4.000000 |
| Rita Ora        | 2020-01-15      | 4.200000 |
| Philip Lose     | 2020-01-15      | 3.800000 |
| Samantha Prabhu | 2020-01-15      | 4.900000 |
| Vikas Jain      | 2020-01-15      | 3.300000 |
...

This approach is useful when we have multiple rows per student and want to compute an aggregate across them.

4.4. Use a Subquery or Common Table Expression (CTE)

Suppose we want to display the complete details of the student with the highest GPA per enrollment date. We can’t list all columns in the SELECT clause unless we group or aggregate every column.

To solve this, we can use a subquery or a common table expression (CTE) to calculate the max GPA per enrollment date, then join it back to the original table to retrieve the full student record. They both enable us to separate the grouping logic from the row selection, making it possible to return full details of specific records within groups.

To illustrate, let’s retrieve the full student record for the student with the highest GPA for each enrollment date:

SELECT s.*
FROM Student s
JOIN (
    SELECT enrollment_date, MAX(gpa) AS top_gpa
    FROM Student
    GROUP BY enrollment_date
) AS grouped
ON s.enrollment_date = grouped.enrollment_date
AND s.gpa = grouped.top_gpa;

Here, the inner query groups students by enrollment date and finds the top GPA for each group. Then, the outer query selects all rows from the Student table where the GPA matches the top GPA for their enrollment date. This gives us the full student records for the top-performing students in each enrollment group:

+------+-----------------+-------------+------------+-----------------+-----------------+------+
| id   | name            | national_id | birth_date | enrollment_date | graduation_date | gpa  |
+------+-----------------+-------------+------------+-----------------+-----------------+------+
| 2009 | Pollards Grey   |  6503453662 | 2003-04-19 | 2022-01-15      | 2025-06-15      | 4.65 |
| 1617 | Philip Mohan    |  3103455662 | 2002-05-15 | 2021-01-15      | 2025-06-15      | 5.00 |
| 1010 | Samantha Prabhu |  3217165566 | 2001-03-21 | 2020-01-15      | 2024-06-15      | 4.90 |
+------+-----------------+-------------+------------+-----------------+-----------------+------+

This approach enables us to avoid the GROUP BY error and still get complete data. If multiple students have the same top GPA on the same enrollment date, this query should return all of them.

Alternatively, we can write the same logic using a common table expression (CTE):

WITH TopGPA AS (
    SELECT enrollment_date, MAX(gpa) AS top_gpa
    FROM Student
    GROUP BY enrollment_date
)
SELECT s.*
FROM Student s
JOIN TopGPA t
  ON s.enrollment_date = t.enrollment_date
 AND s.gpa = t.top_gpa;

This approach is useful when queries are complex, long, or when we need to reuse the grouped results multiple times.

4.5. Use Window Functions for Aggregate Context Without GROUP BY

Window functions are a powerful SQL feature that enables us to perform aggregate-like operations across a set of rows without collapsing them into a single row per group. This is useful when we want to keep every row and include summary information.

Let’s understand an example with window functions in play:

SELECT
  id,
  name,
  enrollment_date,
  gpa,
  AVG(gpa) OVER (PARTITION BY enrollment_date) AS avg_gpa_for_enroll_date
FROM Student;

This query displays every student and their full details. It also adds a column showing the average GPA of all students who enrolled on the same date:

+------+-----------------+-----------------+------+-------------------------+
| id   | name            | enrollment_date | gpa  | avg_gpa_for_enroll_date |
+------+-----------------+-----------------+------+-------------------------+
| 1003 | Rita Ora        | 2020-01-15      | 4.20 |                3.923000 |
| 1007 | Philip Lose     | 2020-01-15      | 3.80 |                3.923000 |
| 1010 | Samantha Prabhu | 2020-01-15      | 4.90 |                3.923000 |
| 1011 | Vikas Jain      | 2020-01-15      | 3.30 |                3.923000 |
...

We use this approach when we need to display each row along with summary data for its group.

5. Conclusion

In this article, we explored the error: Column X is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. We discussed why this error occurs and multiple approaches we can use to handle or fix it, depending on the situation.

Specifically, we saw examples of several methods beyond skipping the offending column:

  • GROUP BY
  • subqueries
  • common table expressions (CTE)
  • window functions

As usual, the queries in this article are over on GitHub.