Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

When working with SQL, the two most common methods for structuring queries are Common Table Expressions (CTEs) and subqueries. So, understanding when and how to use each can optimize query performance and improve code readability.

In this tutorial, we’ll explore and understand, with practical examples, the differences between CTEs and subqueries. This tutorial uses the PostgreSQL database for the practical use case examples. In addition, this applies to MySQL and SQL Server.

In addition, we’ll illustrate this process using schemas from the Baeldung University schema.

2. Why Understand CTEs and Subqueries?

When dealing with complex queries, deciding between using a CTE or a subquery can be confusing. Each has its benefits depending on the situation, such as code clarity, performance, and reusability. For these reasons, we’ll discuss these concepts and look at examples to illustrate their differences in the subsequent sections.

3. Common Table Expressions (CTEs)

First, let’s define, explore the syntax, and create a use case scenario for CTEs.

A CTE defines a temporary result set within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. Additionally, CTE is created using the WITH clause, and it makes complex queries easier to read and maintain.

3.1. Basic Syntax of a CTE

Let’s fully understand the basic syntax of a query statement or clause before using them. Here’s the general syntax:

WITH Cte_name AS (
    -- SQL query here
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition
)
SELECT column1, column2, ...
FROM Cte_name

As shown above, the WITH keyword introduces the CTE, and the name Cte_name given to the CTE clause. So, the name references the result set afterward. Moreover, within the parentheses, we’ve got a standard SQL query, which can be as simple or complex as needed.

Thus, once we define the name for the CTE as Cte_name, we can then use it as if it were a table in any subsequent main query.

3.2. Use Case

In this scenario case, let’s use CTE to find all students who have registered for more than two courses using the university database. For context, we’ll be using the Student and the Registration tables in our example:

  • Student: contains information about students.
  • Registration: records each student’s course registrations.

Here’s the information in these tables:

University=# SELECT * FROM student LIMIT 3;
  id  |    name     | national_id | birth_date | enrollment_date | graduation_date | gpa 
------+-------------+-------------+------------+-----------------+-----------------+-----
 1001 | John Liu    |   123345566 | 2001-04-05 | 2020-01-15      | 2024-06-15      |   4
 1003 | Rita Ora    |   132345166 | 2001-01-14 | 2020-01-15      | 2024-06-15      | 4.2
 1007 | Philip Lose |   321345566 | 2001-06-15 | 2020-01-15      | 2024-06-15      | 3.8
(3 rows)

University=# SELECT * FROM registration LIMIT 3;
 id | semester | year |    reg_datetime     | course_id | student_id 
----+----------+------+---------------------+-----------+------------
  1 | SPRING   | 2022 | 2022-01-11 12:45:56 | CS111     |       1001
  2 | SPRING   | 2022 | 2022-01-11 12:45:56 | CS121     |       1001
  3 | SPRING   | 2022 | 2022-01-11 12:45:56 | CS122     |       1001
(3 rows)

Above, we see what the information in the tables we’re using looks like. Consequently, this lets us better understand the usage of CTE in our use case.

Let’s use CTE to output a list of students who registered for more than 12 courses using the Registration and Student tables from the university schema:

-- Let's create the CTE here, named course_count
WITH Course_count AS (
     SELECT student_id, COUNT(course_id) AS total_courses
     FROM Registration
     GROUP BY student_id
)
-- select a column from student and CTE (Course_count) table, and filter information
SELECT s.name AS student_name, cc.total_courses As courses_registered
FROM Student s
JOIN Course_count cc ON s.id = cc.student_id
WHERE cc.total_courses > 12;
 student_name | courses_registered 
--------------+--------------------
 John Liu     |                 13
 Rose Rit     |                 13
 Roni Roto    |                 13
(3 rows)

As illustrated above, we simplify this whole process of filtering specific information from two different tables. Firstly, we created a CTE table named Course_count, which stores our desired information from the Registration table.

Then, secondly, we joined the Course_count with the Student table and picked out the final columns we needed: student_name and courses_registered.

3.3. Types

There are two main types of Common Table Expressions (CTEs) in SQL, namely, non-recursive and recursive CTEs. Let’s proceed to better understand the CTE types.

Non-recursive CTEs are the standard type of CTE used to simplify complex queries by breaking them down into smaller parts. Notably, these CTEs don’t refer to themselves within their own definition. In addition, the query we used previously is a non-recursive CTE.

Recursive CTEs refer to themselves within their definition, allowing them to perform recursive operations like traversing hierarchical data.

Now, let’s use recursive CTE to find all prerequisites for the course Advanced Algorithms:

WITH RECURSIVE prerequisite_chain AS (
    -- Anchor member: Start with the target course
    SELECT c.id, c.name, p.prerequisite_id
    FROM course c
    JOIN prerequisite p ON c.id = p.course_id
    WHERE c.name = 'Advanced Algorithms'
    
    UNION ALL
    
    -- Recursive member: Find all prerequisites for the courses in the current level
    SELECT c.id, c.name, p.prerequisite_id
    FROM course c
    JOIN prerequisite_chain pc ON c.id = pc.prerequisite_id
    JOIN prerequisite p ON c.id = p.course_id
)
SELECT DISTINCT name
FROM prerequisite_chain;
              name              
--------------------------------
 Advanced Algorithms
 Algorithms: Intermediate Level
(2 rows)

Let’s break down the CTE above. The initial query finds the prerequisites for Advanced Algorithms by matching the course_id in the Prerequisite table with the id in the Course table.

Furthermore, the second query of the CTE recursively finds prerequisites for the courses identified in the previous step by continuing to match prerequisite_id. Subsequently, the query continues until all direct and indirect prerequisites are found, producing a complete list of prerequisite courses.

4. Subqueries

A subquery nests within another query, enclosed in parentheses. All in all, we can use subqueries in SELECT, INSERT, UPDATE, or DELETE statements, and the database evaluates them once per execution of the parent query.

However, an important note is that subqueries can be simpler for single-use cases but may be less readable when nested deeply.

4.1. Basic Syntax

Let’s simply understand that a subquery is simply a query nested inside another SQL query. Let’s review its syntax:

SELECT column1, column2, ...
FROM TableA
WHERE column_name operator ( 
    SELECT column1 
    FROM TableB
    WHERE condition 
);

As shown above, parentheses enclose the subquery. The subquery complements the main query by providing a result that the outer query needs to reference.

In this case, the subquery obtains information from TableB and uses the operator, such as IN, =, ANY, ALL, etc., to compare the value from TableA in the outer query with the result of the subquery.

4.2. Use Case

Let’s use the same university database scenario with the student and registration tables. Here, let’s locate the names of students who are registered in the Introduction to Structural Engineering course:

SELECT name
FROM student
WHERE id IN (
    SELECT student_id
    FROM registration
    WHERE course_id = (
        SELECT id
        FROM course
        WHERE name = 'Introduction to Structural Engineering'
    )
);
   name   
----------
 Rose Rit
(1 row)

As illustrated in our use case, subqueries run step-wise the way they’re nested or arranged. In our use case scenario, the innermost subquery retrieves the course id for the Introduction to Structural Engineering course.

The middle subquery uses the course id obtained from the innermost subquery to find all student_ids registered for this course from the registration table.

The outer query selects the student name from the Student table, where the student_id matches one of those returned by the middle subquery.

4.3. Types

To add to our understanding of subquery, let’s explore its types. There are four types of subqueries, namely, single-row, multi-row, scalar, and correlated subqueries.

A single-row subquery returns only one row and typically works with a single value comparison operator such as =, <, >, etc. Our previous use case query is a typical example of a single-row subquery.

In multi-row subquery, the subquery returns multiple rows and is usually used with IN, ANY, or ALL operators:

SELECT name
FROM student
WHERE id IN (
    SELECT student_id
    FROM registration
    WHERE course_id IN (
        SELECT course_id
        FROM teaching
        WHERE faculty_id = (
            SELECT id
            FROM faculty
            WHERE name = 'Rory Ross'
        )
    )
);
     name     
--------------
 Rose Rit
 Phellum Luis
 Roni Roto
 Piu Liu
(4 rows)

As shown above, the query returns multiple rows, and its implementation resembles the single-row query, where execution proceeds from the innermost to the outer query. In essence, this query lists all the students who are currently taking courses taught by Rory Ross.

In scalar subquery, it returns a single value and is used in expressions or as part of a column selection. Let’s explore a typical example:

SELECT name, gpa,
       (SELECT AVG(gpa) FROM student) AS avg_gpa
FROM student
WHERE gpa > (SELECT AVG(gpa) FROM student);
      name       | gpa  |      avg_gpa      
-----------------+------+-------------------
 John Liu        |    4 | 3.889523835409255
 Rita Ora        |  4.2 | 3.889523835409255
...
 Albert Decosta  |    4 | 3.889523835409255
 Roni Roto       | 4.44 | 3.889523835409255
(12 rows)

Here, the subquery calculates the average GPA of all students, and the main query selects the names and GPAs of students whose GPA is higher than the average.

A correlated subquery is a subquery that references columns from the outer query, making it dependent on the outer query’s result.

SELECT name
FROM student s
WHERE (
    SELECT COUNT(*)
    FROM registration r
    WHERE r.student_id = s.id
) = (
    SELECT MAX(course_count)
    FROM (
        SELECT COUNT(*) AS course_count
        FROM registration
        GROUP BY student_id
    ) AS subquery
);
   name    
-----------
 John Liu
 Rose Rit
 Roni Roto
(3 rows)

The correlated subquery counts the number of courses for each student with r.student_id = s.id, which links the inner and outer queries. The main query selects the names of students who have registered for the maximum number of courses.

5. Differences

Let’s look at the key differences between CTE and Subquery:

Feature CTE Subquery
Definition  Defined using the WITH clause Nested within the SELECT, INSERT, UPDATE, or DELETE clause
Readability Generally more readable and easier to manage for complex queries Can be less readable when deeply nested
Reusability Can be referenced multiple times within the same query Used only once per execution of the parent query
Performance May be optimized differently by the SQL engine Can sometimes be less efficient due to multiple evaluations

6. Conclusion

In this article, we’ve explored and delved into the differences between CTE and subqueries by examining their basic syntax, use cases, and understanding their differences. CTEs enhance readability and reusability for complex queries with multiple result set references. Subqueries suit simpler, one-off queries.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.