
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: January 11, 2025
Common table expressions (CTE) are SQL features for enhancing query readability. CTEs allow us to define temporary result sets that can be referenced within a SQL statement. They’re particularly useful for simplifying complex queries and organizing intermediate results in a structured manner.
In this tutorial, we’ll learn about common table expressions with practical examples by querying several tables from the full Baeldung University schema.
Common Table Expressions allow us to declare a complex query upfront to create a temporary table that we can reuse later. This is useful for performing JOINs or calculations up front and then being able to reference these results again in our final query without needing to re-account for those complex operations. CTEs help us to deconstruct complex queries into simpler blocks and reuse them.
For example, we can define a query as cte_name and then use in the FROM clause of the main query:
WITH cte_name (optional_column_list) AS (
-- SQL query as CTE Query Definition
SELECT columns
FROM table
WHERE condition
)
SELECT columns
FROM cte_name
WHERE condition;
The WITH clause defines a CTE, followed by its name and the query for the result set. Subsequent subqueries can access the result of the CTE.
CTEs help create simpler, readable, and reusable queries from complex queries. Suppose we want to find the total_credits for all the active courses the students are enrolled in. We can use a CTE to simplify this process:
WITH ActiveCourse AS (
SELECT s.id, s.name AS student_name,
c.name AS course_title,
SUM(c.credits) AS course_credits
FROM Student AS s
JOIN Exam AS e
ON e.student_id = s.id
JOIN course AS c
ON c.id = e.course_id
WHERE c.is_active = 'Yes'
GROUP BY s.id
)
SELECT *
FROM ActiveCourse
WHERE course_credits > 70;
This CTE uses multiple JOINs to query the student_name, course_title, and course_credits of students with active courses. Then we query the CTE for students with more than 70 course_credits.
+------+-----------------+-----------------------------------+----------------+
| id | student_name | course_title | course_credits |
+------+-----------------+-----------------------------------+----------------+
| 1003 | Rita Ora | Introduction to Databases | 73 |
| 1007 | Philip Lose | Introduction to Databases | 80 |
| 1010 | Samantha Prabhu | Introduction to Data Structures | 73 |
| 1107 | Phellum Luis | Mechanics of Solids-I | 80 |
| 1607 | Peter Liu | Introduction to Operating Systems | 82 |
| 1610 | Ritu Raj | Introduction to Operating Systems | 78 |
| 1710 | Roni Roto | Introduction to Operating Systems | 80 |
| 2008 | Julia Roberts | Introduction to Operating Systems | 80 |
| 2009 | Pollards Grey | Statistical Machine Learning | 73 |
+------+-----------------+-----------------------------------+----------------+
9 rows in set (0.012 sec)
A significant advantage of a CTE is the ability to reference itself. A recursive query references a recursive CTE, while a recursive CTE is a query that repeatedly references itself during execution. Let’s use a recursive CTE to calculate the yearly breakdown of enrollment duration for students.
WITH RECURSIVE EnrollmentProgression AS (
SELECT
id AS student_id,
name,
YEAR(enrollment_date) AS active_year
FROM
Student
UNION ALL
-- Recursive statement to increment the year until the graduation year
SELECT
ep.student_id,
ep.name,
ep.active_year + 1 AS year
FROM
EnrollmentProgression ep
INNER JOIN
student s ON ep.student_id = s.id
WHERE
ep.active_year < YEAR(s.graduation_date)
)
-- Select all active years for each student's enrollment
SELECT
student_id,
name,
active_year
FROM
EnrollmentProgression
ORDER BY
student_id, active_year
LIMIT 20;
This gives us a year-by-year breakdown of a student’s active enrollment. The table below shows all the years a student is enrolled. Here, we LIMIT the result to the first 20 rows.
+------------+-----------------+-------------+
| student_id | name | active_year |
+------------+-----------------+-------------+
| 1001 | John Liu | 2020 |
| 1001 | John Liu | 2021 |
| 1001 | John Liu | 2022 |
| 1001 | John Liu | 2023 |
| 1001 | John Liu | 2024 |
| 1003 | Rita Ora | 2020 |
| 1003 | Rita Ora | 2021 |
| 1003 | Rita Ora | 2022 |
| 1003 | Rita Ora | 2023 |
| 1003 | Rita Ora | 2024 |
| 1007 | Philip Lose | 2020 |
| 1007 | Philip Lose | 2021 |
| 1007 | Philip Lose | 2022 |
| 1007 | Philip Lose | 2023 |
| 1007 | Philip Lose | 2024 |
| 1010 | Samantha Prabhu | 2020 |
| 1010 | Samantha Prabhu | 2021 |
| 1010 | Samantha Prabhu | 2022 |
| 1010 | Samantha Prabhu | 2023 |
| 1010 | Samantha Prabhu | 2024 |
+------------+-----------------+-------------+
20 rows in set (0.005 sec)
Using a CTE as a substitute for a view means that the query only exists for the query’s duration. The CTE is temporary and only exists in the context of a single query, unlike a view that is stored in the database. We can use a CTE to calculate the average GPA of all students by year. Instead of creating a persistent view for this, we’ll use a CTE:
WITH EnrollmentStats AS (
SELECT
YEAR(s.enrollment_date) AS enrollment_year,
AVG(s.gpa) AS average_gpa
FROM
Student AS s
WHERE
s.gpa IS NOT NULL
GROUP BY
YEAR(s.enrollment_date)
)
SELECT
enrollment_year,
ROUND(average_gpa,3) AS average_gpa
FROM
EnrollmentStats
WHERE
average_gpa > 3.0
ORDER BY
enrollment_year;
The CTE calculates the average grade points for each student based on their enrollment year using the CASE statement by attributing numbers to each grade:
+-----------------+----------------------+
| enrollment_year | average_gpa |
+-----------------+----------------------+
| 2020 | 3.923 |
| 2021 | 3.717 |
| 2022 | 4.108 |
+-----------------+----------------------+
3 rows in set (0.006 sec)
Using a Common Table Expression also makes it easier to handle a scenario from a scalar subselect. A scalar subselect is a subquery that returns a single value for each row in the main query. For example, we want to categorize students based on their age at the time of enrollment. Whereas age is the difference between enrollment_date and birth_date. After calculating this derived column, we can also group students into age categories like “18 Years”, “19 Years”, and “Above 19”:
WITH AgeCategories AS (
SELECT s.id AS student_id,
s.name AS student_name,
TIMESTAMPDIFF(YEAR, s.birth_date, s.enrollment_date) AS age_at_enrollment,
CASE
WHEN TIMESTAMPDIFF(YEAR, s.birth_date, s.enrollment_date) = 18 THEN '18 Years'
WHEN TIMESTAMPDIFF(YEAR, s.birth_date, s.enrollment_date) = 19 THEN '19 Years'
ELSE 'Above 19'
END AS age_category
FROM Student AS s
)
SELECT age_category,
COUNT(student_id) AS student_count
FROM AgeCategories
GROUP BY age_category;
The above query categorizes students based on their age at their enrollment_date. Finally, using the TIMESTAMPDIFF() function, it calculates the student’s age in years as a derived column. The result is the number of students that enrolled at each age:
+--------------+---------------+
| age_category | student_count |
+--------------+---------------+
| 18 Years | 23 |
| 19 Years | 2 |
+--------------+---------------+
2 rows in set (0.001 sec)
If we need to reference the same subquery result multiple times within a query, a Common Table Expression makes the SQL code cleaner, more readable, and more efficient by avoiding redundancy. Instead of repeating the same subquery, we can define it once in a CTE and then reference it multiple times.
In this case, let’s analyze the student’s performance by calculating the total number of exams taken, the number of exams with an “A” grade, and the average grade point for each student:
WITH ExamSummary AS (
SELECT
e.student_id,
s.name AS student_name,
COUNT(*) AS total_exams,
SUM(CASE WHEN e.grade = 'A' THEN 1 ELSE 0 END) AS a_grades_count,
AVG(
CASE
WHEN e.grade = 'A' THEN 4
WHEN e.grade = 'B' THEN 3
WHEN e.grade = 'C' THEN 2
WHEN e.grade = 'D' THEN 1
ELSE 0
END
) AS average_grade_points
FROM
Exam e
JOIN
student s ON e.student_id = s.id
GROUP BY
e.student_id, s.name
)
-- main query
SELECT
es1.student_name,
es1.total_exams,
es1.a_grades_count,
es1.average_grade_points,
CASE
WHEN es1.a_grades_count >= 3 THEN 'Top Performer'
ELSE 'Regular Performer'
END AS performance_category
FROM
ExamSummary es1
JOIN
ExamSummary es2 ON es1.student_id = es2.student_id
WHERE
es1.total_exams > 3;
In the main query, the CTE ExamSummary is referenced twice. The first time to get the student’s main performance data
and the second time to group the data from the CTE by student. Finally, the derived column performance_category classifies students with at
least three A’s as a “Top Performer”:
+-----------------+-------------+----------------+----------------------+----------------------+
| student_name | total_exams | a_grades_count | average_grade_points | performance_category |
+-----------------+-------------+----------------+----------------------+----------------------+
| John Liu | 12 | 1 | 0.5833 | Regular Performer |
| Rita Ora | 12 | 3 | 0.9999 | Top Performer |
| Philip Lose | 12 | 3 | 0.9999 | Top Performer |
| Samantha Prabhu | 12 | 2 | 0.9167 | Regular Performer |
| Rose Rit | 12 | 4 | 0.9999 | Top Performer |
| Phellum Luis | 12 | 4 | 0.9999 | Top Performer |
| Peter Liu | 12 | 2 | 0.9167 | Regular Performer |
| Ritu Raj | 12 | 2 | 0.6667 | Regular Performer |
| Piu Liu | 12 | 4 | 0.9999 | Top Performer |
| Roni Roto | 12 | 4 | 0.9999 | Top Performer |
| Julia Roberts | 12 | 1 | 0.8333 | Regular Performer |
| Pollards Grey | 11 | 1 | 0.9091 | Regular Performer |
+-----------------+-------------+----------------+----------------------+----------------------+
12 rows in set (0.017 sec)
The table above shows the performance category for students with more than three As in their exams.
In this article, we’ve explored several use cases of Common Table Expressions, including recursion, substitutes for views, using CTEs to break down complex queries, grouping scalar subselect and non-deterministic functions, and referencing a table result multiple times.
CTEs let us create a temporarily named result set for breaking down complex queries into readable components without altering the database schema.