Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

Subqueries are useful SQL tools that allow us to nest one query inside another. However, combining multiple subqueries can make the overall query inefficient, long, and hard to understand and maintain.

In this tutorial, we’ll explore using subqueries in SELECT, WHERE, and FROM clauses and show how to handle them when they become repetitive (and redundant).

All our examples are based on our University database schema.

2. Understanding Subqueries in SQL

A subquery, also known as a nested query, is a query inside another SQL query. We can use it in various parts of an SQL statement, such as the SELECT, FROM, or WHERE clauses.

For example:

SELECT column1, column2, ...
FROM table_name
WHERE column_name operator (SELECT column_name 
                            FROM table_name 
                            WHERE condition
                           );

We can understand subqueries as if the inner query is executed first, after which its results are used when executing the outer query. However, it’s important to note that, in practice, database query engines optimize the execution order.

3. Using Multiple Subqueries Referencing the Same Table

Let’s explore how to use multiple subqueries with the same table.

3.1. In the SELECT Clause

First, we’ll take a look at an example with the Student and Exam tables, in which we’ll retrieve each student’s name and the number of exams they’ve taken:

SELECT s.name,
       (SELECT COUNT(*)
        FROM Exam e
        WHERE e.student_id = s.id) AS exam_count
FROM Student s;
+-----------------+------------+
| name            | exam_count |
+-----------------+------------+
| John Liu        |         12 |
| Rita Ora        |         12 |
| Samantha Prabhu |         12 |
| Vikas Jain      |          0 |
| Julia Roberts   |         12 |
...
| Pollards Grey   |         11 |
+-----------------+------------+

The subquery in the SELECT clause counts the number of exams for each student.

3.2. In the WHERE Clause

Subqueries in the WHERE clause filter results based on complex conditions. Let’s find students who have taken more than five exams:

SELECT s.name
FROM Student s
WHERE 5 < (
    SELECT COUNT(*)
    FROM Exam e
    WHERE e.student_id = s.id
);
+-----------------+
| name            |
+-----------------+
| John Liu        |
| Rita Ora        |
| Philip Lose     |
| Samantha Prabhu |
| Rose Rit        |
| Phellum Luis    |
| Peter Liu       |
| Ritu Raj        |
| Piu Liu         |
| Roni Roto       |
| Julia Roberts   |
| Pollards Grey   |
+-----------------+

We use a subquery in the WHERE clause to count the exams.

3.3. In the FROM Clause

Subqueries can also be used in the FROM clause. Let’s find the course that each student has attempted to pass the most times (i.e., taken the most exams for):

SELECT s.name, exam_counts.course_name, exam_counts.exam_count
FROM Student s
JOIN (
    SELECT e.student_id, c.name AS course_name, COUNT(*) AS exam_count
    FROM Exam e
    JOIN Course c ON e.course_id = c.id
    GROUP BY e.student_id, c.name
    HAVING COUNT(*) = (
        SELECT MAX(exam_count)
        FROM (
            SELECT student_id, course_id, COUNT(*) AS exam_count
            FROM Exam
            GROUP BY student_id, course_id
        ) max_counts
        WHERE max_counts.student_id = e.student_id
    )
) exam_counts ON s.id = exam_counts.student_id;
+-----------------+-----------------------------------+------------+
| name            | course_name                       | exam_count |
+-----------------+-----------------------------------+------------+
| Rita Ora        | Introduction to Algorithms        |          2 |
| Samantha Prabhu | Linear Algebra-II                 |          2 |
| Peter Liu       | Operating Systems: Intermediate   |          2 |
| Julia Roberts   | Advanced Operations Management    |          2 |
| Rose Rit        | Introduction to Calculus          |          2 |
...
| Piu Liu         | Operating Systems: Intermediate   |          2 |
+-----------------+-----------------------------------+------------+

The query above identifies, for each student, the course for which they have taken the most exams.

4. Addressing Redundancy in Subqueries

While subqueries are helpful, they can lead to inefficient or repetitive queries, reducing query performance, especially when dealing with a large dataset.

4.1. Multiple Subqueries Make Queries Complex

Let’s retrieve the student information, details about their most recent exam, and the total number of exams taken, using multiple subqueries:

SELECT s.name,
       (SELECT e.exam_date
        FROM Exam e
        WHERE e.student_id = s.id 
        AND e.exam_date >= ALL (SELECT exam_date FROM Exam WHERE student_id = s.id)) AS last_exam_date,
       (SELECT c.name
        FROM Exam e
        JOIN Course c ON e.course_id = c.id
        WHERE e.student_id = s.id 
        AND e.exam_date >= ALL (SELECT exam_date FROM Exam WHERE student_id = s.id)) AS last_exam_course,
       (SELECT COUNT(*)
        FROM Exam e
        WHERE e.student_id = s.id) AS total_exams
FROM Student s;
+-----------------+----------------+-----------------------------------+-------------+
| name            | last_exam_date | last_exam_course                  | total_exams |
+-----------------+----------------+-----------------------------------+-------------+
| John Liu        | 2023-12-14     | Linear Algebra-III                |          12 |
| Rita Ora        | 2023-12-19     | Introduction to Algorithms        |          12 |
| Philip Lose     | 2023-12-17     | Signal and Systems                |          12 |
| Samantha Prabhu | 2023-12-20     | Linear Algebra-II                 |          12 |
| Rose Rit        | 2023-12-21     | Introduction to Calculus          |          12 |
| Pollards Grey   | 2023-12-23     | Advanced Operations Management    |          11 |
...
| Potu Singh      | NULL           | NULL                              |           0 |
+-----------------+----------------+-----------------------------------+-------------+

The query above is complex and challenging to read. Let’s see how we can get the same data using a more efficient and readable query.

4.2. Using Common Table Expressions

We can use a common table expression (CTE) along with a window function to create a solution that works across MySQL, SQL Server, and PostgreSQL:

WITH RankedExams AS (
    SELECT 
        e.student_id,
        c.name AS course_name,
        e.exam_date,
        ROW_NUMBER() OVER (PARTITION BY e.student_id ORDER BY e.exam_date DESC) AS rn,
        COUNT(*) OVER (PARTITION BY e.student_id) AS total_exams
    FROM Exam e
    JOIN Course c ON e.course_id = c.id
)
SELECT 
    s.name,
    re.exam_date AS last_exam_date,
    re.course_name AS last_exam_course,
    re.total_exams
FROM Student s
LEFT JOIN RankedExams re ON s.id = re.student_id AND re.rn = 1;
+-----------------+----------------+-----------------------------------+-------------+
| name            | last_exam_date | last_exam_course                  | total_exams |
+-----------------+----------------+-----------------------------------+-------------+
| John Liu        | 2023-12-14     | Linear Algebra-III                |          12 |
| Jia Grey        | NULL           | NULL                              |        NULL |
| Rose Rit        | 2023-12-21     | Introduction to Calculus          |          12 |
| Phellum Luis    | 2023-12-21     | Introduction to Calculus          |          12 |
| Julia Roberts   | 2023-12-23     | Advanced Operations Management    |          12 |
...
| Pollards Grey   | 2023-12-23     | Advanced Operations Management    |          11 |
+-----------------+----------------+-----------------------------------+-------------+

This approach uses a CTE to rank exams for each student and calculate the total exam count. It’s more efficient because it processes the Exam table only once. The main query joins this CTE with the Student table to get the final result.

4.3. Using JOIN

Another cross-compatible solution is to use JOIN with a correlated subquery:

SELECT s.name,
       e.exam_date AS last_exam_date,
       c.name AS last_exam_course,
       (SELECT COUNT(*) FROM Exam WHERE student_id = s.id) AS total_exams
FROM Student s
LEFT JOIN Exam e ON s.id = e.student_id
LEFT JOIN Course c ON e.course_id = c.id
WHERE e.exam_date = (
    SELECT exam_date
    FROM Exam
    WHERE student_id = s.id
    AND exam_date >= ALL (SELECT exam_date FROM Exam WHERE student_id = s.id)
);
+-----------------+----------------+-----------------------------------+-------------+
| name            | last_exam_date | last_exam_course                  | total_exams |
+-----------------+----------------+-----------------------------------+-------------+
| Julia Roberts   | 2023-12-23     | Advanced Operations Management    |          12 |
| Pollards Grey   | 2023-12-23     | Advanced Operations Management    |          11 |
| Rose Rit        | 2023-12-21     | Introduction to Calculus          |          12 |
| Phellum Luis    | 2023-12-21     | Introduction to Calculus          |          12 |
| Roni Roto       | 2023-12-22     | Introduction to Operating Systems |          12 |
...
| Piu Liu         | 2023-12-22     | Operating Systems: Intermediate   |          12 |
+-----------------+----------------+-----------------------------------+-------------+

In the query above, we use the subquery in the WHERE clause to find each student’s most recent exam date, then join this with the Exam and Course tables to get the full exam details. The total exam count is calculated using a separate subquery in the SELECT clause.

This query is more readable than the initial one because it reduces redundancy by using a subquery to find the most recent exam date only once. It also allows for better optimization by the database engine since it uses JOINS instead of multiple correlated subqueries.

5. Conclusion

In this article, we explored the use of multiple subqueries in SQL and addressed the common problem of redundant subqueries. We can use common table expressions and JOINs as they allow us to write more efficient and readable queries.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.