Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: April 9, 2025
When working with SQL, the two most common methods for structuring queries are Common Table Expressions (CTEs) and subqueries. So, understanding when and how to use each can optimize query performance and improve code readability.
In this tutorial, we’ll explore and understand, with practical examples, the differences between CTEs and subqueries. This tutorial uses the PostgreSQL database for the practical use case examples. In addition, this applies to MySQL and SQL Server.
In addition, we’ll illustrate this process using schemas from the Baeldung University schema.
When dealing with complex queries, deciding between using a CTE or a subquery can be confusing. Each has its benefits depending on the situation, such as code clarity, performance, and reusability. For these reasons, we’ll discuss these concepts and look at examples to illustrate their differences in the subsequent sections.
First, let’s define, explore the syntax, and create a use case scenario for CTEs.
A CTE defines a temporary result set within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. Additionally, CTE is created using the WITH clause, and it makes complex queries easier to read and maintain.
Let’s fully understand the basic syntax of a query statement or clause before using them. Here’s the general syntax:
WITH Cte_name AS (
-- SQL query here
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
SELECT column1, column2, ...
FROM Cte_name
As shown above, the WITH keyword introduces the CTE, and the name Cte_name given to the CTE clause. So, the name references the result set afterward. Moreover, within the parentheses, we’ve got a standard SQL query, which can be as simple or complex as needed.
Thus, once we define the name for the CTE as Cte_name, we can then use it as if it were a table in any subsequent main query.
In this scenario case, let’s use CTE to find all students who have registered for more than two courses using the university database. For context, we’ll be using the Student and the Registration tables in our example:
Here’s the information in these tables:
University=# SELECT * FROM student LIMIT 3;
id | name | national_id | birth_date | enrollment_date | graduation_date | gpa
------+-------------+-------------+------------+-----------------+-----------------+-----
1001 | John Liu | 123345566 | 2001-04-05 | 2020-01-15 | 2024-06-15 | 4
1003 | Rita Ora | 132345166 | 2001-01-14 | 2020-01-15 | 2024-06-15 | 4.2
1007 | Philip Lose | 321345566 | 2001-06-15 | 2020-01-15 | 2024-06-15 | 3.8
(3 rows)
University=# SELECT * FROM registration LIMIT 3;
id | semester | year | reg_datetime | course_id | student_id
----+----------+------+---------------------+-----------+------------
1 | SPRING | 2022 | 2022-01-11 12:45:56 | CS111 | 1001
2 | SPRING | 2022 | 2022-01-11 12:45:56 | CS121 | 1001
3 | SPRING | 2022 | 2022-01-11 12:45:56 | CS122 | 1001
(3 rows)
Above, we see what the information in the tables we’re using looks like. Consequently, this lets us better understand the usage of CTE in our use case.
Let’s use CTE to output a list of students who registered for more than 12 courses using the Registration and Student tables from the university schema:
-- Let's create the CTE here, named course_count
WITH Course_count AS (
SELECT student_id, COUNT(course_id) AS total_courses
FROM Registration
GROUP BY student_id
)
-- select a column from student and CTE (Course_count) table, and filter information
SELECT s.name AS student_name, cc.total_courses As courses_registered
FROM Student s
JOIN Course_count cc ON s.id = cc.student_id
WHERE cc.total_courses > 12;
student_name | courses_registered
--------------+--------------------
John Liu | 13
Rose Rit | 13
Roni Roto | 13
(3 rows)
As illustrated above, we simplify this whole process of filtering specific information from two different tables. Firstly, we created a CTE table named Course_count, which stores our desired information from the Registration table.
Then, secondly, we joined the Course_count with the Student table and picked out the final columns we needed: student_name and courses_registered.
There are two main types of Common Table Expressions (CTEs) in SQL, namely, non-recursive and recursive CTEs. Let’s proceed to better understand the CTE types.
Non-recursive CTEs are the standard type of CTE used to simplify complex queries by breaking them down into smaller parts. Notably, these CTEs don’t refer to themselves within their own definition. In addition, the query we used previously is a non-recursive CTE.
Recursive CTEs refer to themselves within their definition, allowing them to perform recursive operations like traversing hierarchical data.
Now, let’s use recursive CTE to find all prerequisites for the course Advanced Algorithms:
WITH RECURSIVE prerequisite_chain AS (
-- Anchor member: Start with the target course
SELECT c.id, c.name, p.prerequisite_id
FROM course c
JOIN prerequisite p ON c.id = p.course_id
WHERE c.name = 'Advanced Algorithms'
UNION ALL
-- Recursive member: Find all prerequisites for the courses in the current level
SELECT c.id, c.name, p.prerequisite_id
FROM course c
JOIN prerequisite_chain pc ON c.id = pc.prerequisite_id
JOIN prerequisite p ON c.id = p.course_id
)
SELECT DISTINCT name
FROM prerequisite_chain;
name
--------------------------------
Advanced Algorithms
Algorithms: Intermediate Level
(2 rows)
Let’s break down the CTE above. The initial query finds the prerequisites for Advanced Algorithms by matching the course_id in the Prerequisite table with the id in the Course table.
Furthermore, the second query of the CTE recursively finds prerequisites for the courses identified in the previous step by continuing to match prerequisite_id. Subsequently, the query continues until all direct and indirect prerequisites are found, producing a complete list of prerequisite courses.
A subquery nests within another query, enclosed in parentheses. All in all, we can use subqueries in SELECT, INSERT, UPDATE, or DELETE statements, and the database evaluates them once per execution of the parent query.
However, an important note is that subqueries can be simpler for single-use cases but may be less readable when nested deeply.
Let’s simply understand that a subquery is simply a query nested inside another SQL query. Let’s review its syntax:
SELECT column1, column2, ...
FROM TableA
WHERE column_name operator (
SELECT column1
FROM TableB
WHERE condition
);
As shown above, parentheses enclose the subquery. The subquery complements the main query by providing a result that the outer query needs to reference.
In this case, the subquery obtains information from TableB and uses the operator, such as IN, =, ANY, ALL, etc., to compare the value from TableA in the outer query with the result of the subquery.
Let’s use the same university database scenario with the student and registration tables. Here, let’s locate the names of students who are registered in the Introduction to Structural Engineering course:
SELECT name
FROM student
WHERE id IN (
SELECT student_id
FROM registration
WHERE course_id = (
SELECT id
FROM course
WHERE name = 'Introduction to Structural Engineering'
)
);
name
----------
Rose Rit
(1 row)
As illustrated in our use case, subqueries run step-wise the way they’re nested or arranged. In our use case scenario, the innermost subquery retrieves the course id for the Introduction to Structural Engineering course.
The middle subquery uses the course id obtained from the innermost subquery to find all student_ids registered for this course from the registration table.
The outer query selects the student name from the Student table, where the student_id matches one of those returned by the middle subquery.
To add to our understanding of subquery, let’s explore its types. There are four types of subqueries, namely, single-row, multi-row, scalar, and correlated subqueries.
A single-row subquery returns only one row and typically works with a single value comparison operator such as =, <, >, etc. Our previous use case query is a typical example of a single-row subquery.
In multi-row subquery, the subquery returns multiple rows and is usually used with IN, ANY, or ALL operators:
SELECT name
FROM student
WHERE id IN (
SELECT student_id
FROM registration
WHERE course_id IN (
SELECT course_id
FROM teaching
WHERE faculty_id = (
SELECT id
FROM faculty
WHERE name = 'Rory Ross'
)
)
);
name
--------------
Rose Rit
Phellum Luis
Roni Roto
Piu Liu
(4 rows)
As shown above, the query returns multiple rows, and its implementation resembles the single-row query, where execution proceeds from the innermost to the outer query. In essence, this query lists all the students who are currently taking courses taught by Rory Ross.
In scalar subquery, it returns a single value and is used in expressions or as part of a column selection. Let’s explore a typical example:
SELECT name, gpa,
(SELECT AVG(gpa) FROM student) AS avg_gpa
FROM student
WHERE gpa > (SELECT AVG(gpa) FROM student);
name | gpa | avg_gpa
-----------------+------+-------------------
John Liu | 4 | 3.889523835409255
Rita Ora | 4.2 | 3.889523835409255
...
Albert Decosta | 4 | 3.889523835409255
Roni Roto | 4.44 | 3.889523835409255
(12 rows)
Here, the subquery calculates the average GPA of all students, and the main query selects the names and GPAs of students whose GPA is higher than the average.
A correlated subquery is a subquery that references columns from the outer query, making it dependent on the outer query’s result.
SELECT name
FROM student s
WHERE (
SELECT COUNT(*)
FROM registration r
WHERE r.student_id = s.id
) = (
SELECT MAX(course_count)
FROM (
SELECT COUNT(*) AS course_count
FROM registration
GROUP BY student_id
) AS subquery
);
name
-----------
John Liu
Rose Rit
Roni Roto
(3 rows)
The correlated subquery counts the number of courses for each student with r.student_id = s.id, which links the inner and outer queries. The main query selects the names of students who have registered for the maximum number of courses.
Let’s look at the key differences between CTE and Subquery:
| Feature | CTE | Subquery |
|---|---|---|
| Definition | Defined using the WITH clause | Nested within the SELECT, INSERT, UPDATE, or DELETE clause |
| Readability | Generally more readable and easier to manage for complex queries | Can be less readable when deeply nested |
| Reusability | Can be referenced multiple times within the same query | Used only once per execution of the parent query |
| Performance | May be optimized differently by the SQL engine | Can sometimes be less efficient due to multiple evaluations |
In this article, we’ve explored and delved into the differences between CTE and subqueries by examining their basic syntax, use cases, and understanding their differences. CTEs enhance readability and reusability for complex queries with multiple result set references. Subqueries suit simpler, one-off queries.