Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

In SQL, a subselect, also known as a subquery, is a query written within another SQL statement. Subselects are commonly used in the WHERE, FROM, or SELECT clauses to filter or compute values dynamically. They enable the retrieval of data based on calculated conditions such as averages, maximums, or grouped records.

In this tutorial, we’ll explore how to use columns from a subselect in the WHERE clause in SQL, focusing on the MySQL, PostgreSQL, and SQL Server databases.

2. Sample Data

For the examples, we use the Student table from the Baeldung database schema to demonstrate different types of subselects within the WHERE clause:

SELECT * FROM Student;
+------+-----------------+-------------+------------+-----------------+-----------------+------+
| id   | name            | national_id | birth_date | enrollment_date | graduation_date | gpa  |
+------+-----------------+-------------+------------+-----------------+-----------------+------+
| 1001 | John Liu        |   123345566 | 2001-04-05 | 2020-01-15      | 2024-06-15      | 4.00 |
| 1003 | Rita Ora        |   132345166 | 2001-01-14 | 2020-01-15      | 2024-06-15      | 4.20 |
| 1007 | Philip Lose     |   321345566 | 2001-06-15 | 2020-01-15      | 2024-06-15      | 3.80 |
| 1010 | Samantha Prabhu |  3217165566 | 2001-03-21 | 2020-01-15      | 2024-06-15      | 4.90 |
| 1011 | Vikas Jain      |   321345662 | 2001-07-18 | 2020-01-15      | NULL            | 3.30 |
| 1101 | Jia Grey        |  1345236267 | 2001-02-05 | 2020-01-15      | 2024-06-15      | 3.98 |
| 1103 | Rose Rit        |  1323612067 | 2001-05-14 | 2020-01-15      | NULL            | 3.57 |
...

This table stores information about each student, such as name, ID, birth date, and others.

3. Subqueries and Subquery Classifications

Subqueries enable us to filter rows in the outer query based on results from an inner query. This way, we can perform complex filtering.

There are three types of subqueries we can use in the WHERE clause:

  • scalar subquery: returns a single value
  • IN subquery: returns a set of values
  • EXISTS subquery: checks whether a subquery returns any rows

Furthermore, they can be classified into either correlated or non-correlated:

  • correlated subquerydepends on the outer query for its execution because it references columns from that outer query
  • non-correlated subquery: runs independently of the outer query because it’s executed once and returns a result that the outer query uses

Now, let’s discuss these subqueries in the upcoming sections.

4. Scalar Subquery

A scalar subquery returns a single value, exactly one row, and one column. We use it with comparison operators to filter records based on an aggregated or derived value from the same or another table.

Scalar subqueries follow a basic structure:

SELECT column1, column2
FROM table
WHERE column3 operator (
    SELECT single_column
    FROM another_table
    WHERE condition
);

Here, the operator can be any comparison operator, such as =, <, >, <=, >=, or <>. It returns only one row and one column. Critically, if it attempts to return more than one row, the query raises a runtime error. If it returns no rows, the condition evaluates to NULL, and the row is excluded from the result.

4.1. Find Students With the Highest GPA

To demonstrate, let’s retrieve the names of students with the highest GPA from the Student table:

SELECT name, gpa
FROM Student
WHERE gpa = (
    SELECT MAX(gpa)
    FROM Student
);

In the above query, the inner query calculates the highest GPA in the Student table, while the outer query retrieves all students whose GPA matches the maximum value:

+--------------+------+
| name         | gpa  |
+--------------+------+
| Philip Mohan | 5.00 |
+--------------+------+

Thus, the output represents the student with the highest GPA. This query works in MySQL, PostgreSQL, and SQL Server without any changes.

4.2. Find Students Who Scored Above the Average GPA

Next, let’s retrieve students whose GPA is greater than the average GPA of all students:

SELECT name, gpa
FROM Student
WHERE gpa > (
    SELECT AVG(gpa)
    FROM Student
    WHERE gpa IS NOT NULL
);

Here, the subquery SELECT AVG(gpa) FROM Student WHERE gpa IS NOT NULL calculates the average GPA using the AVG() function, excluding NULL values.

The outer query then selects all students whose GPA is greater than the average GPA:

+-----------------+------+
| name            | gpa  |
+-----------------+------+
| John Liu        | 4.00 |
| Rita Ora        | 4.20 |
| Samantha Prabhu | 4.90 |
...

Thus, we get the expected result.

4.3. Behavior and Compatibility

The scalar subqueries in these examples are non-correlated because the inner query can be executed independently without referencing the outer query’s columns. The inner query runs once, and the outer query filters rows using the single value we returned.

This query works in MySQL, PostgreSQL, and SQL Server.

5. IN Subquery

An IN subquery enables us to check if a column value exists within a set of values returned by a subquery.

It follows a general syntax:

SELECT column1, column2, ...
FROM table_name
WHERE column_name IN (
    SELECT column_name
    FROM another_table
    WHERE condition
);

Let’s see some specific examples.

5.1. Filter Students With Matching GPA of Non-Graduated Students

For example, let’s find all students whose GPA matches any student who hasn’t graduated:

SELECT name, gpa
FROM Student
WHERE gpa IN (
    SELECT gpa
    FROM Student
    WHERE graduation_date IS NULL
    AND gpa IS NOT NULL
);

The subquery SELECT gpa FROM Student WHERE graduation_date IS NULL AND gpa IS NOT NULL returns a list of GPA values for students who haven’t graduated.

The outer query then filters and returns students whose GPA matches any values in the list:

+----------------+------+
| name           | gpa  |
+----------------+------+
| Vikas Jain     | 3.30 |
| Rose Rit       | 3.57 |
| Vineet Jha     | 2.90 |
...

Now, we can see the respective results.

5.2. Behavior and Compatibility

This query works in MySQL, PostgreSQL, and SQL Server.

The query is a non-correlated subquery since the subquery executes independently of the outer query and produces a complete set of GPA values. The outer query then uses this set in the IN clause for filtering.

6. EXISTS Subquery

The EXISTS subquery checks whether the subquery returns any rows. Unlike scalar and IN subqueries that evaluate to a value or set of values, EXISTS returns a Boolean value:

  • TRUE if the subquery returns at least one row
  • FALSE if the subquery returns none

EXISTS follows a general syntax:

SELECT column1, column2
FROM table1
WHERE EXISTS (
    SELECT 1
    FROM table2
    WHERE condition
);

Here, SELECT 1 returns a constant value because the actual data from the subquery is irrelevant; only the existence of matching rows matters. Using SELECT 1 is a common practice because the EXISTS clause only checks for the existence of rows and doesn’t depend on the values returned.

6.1. Find Students With Duplicate GPAs

To illustrate, let’s find students who share their GPA with at least one other student:

SELECT s1.name, s1.gpa
FROM Student s1
WHERE EXISTS (
    SELECT 1
    FROM Student s2
    WHERE s2.gpa = s1.gpa
      AND s2.id <> s1.id
);

In the above query, the inner query looks for another student who has the same GPA but a different ID than the current student:

+----------------+------+
| name           | gpa  |
+----------------+------+
| John Liu       | 4.00 |
| Albert Decosta | 4.00 |
+----------------+------+

In this case, we have two students who share the GPA.

6.2. Behavior and Compatibility

This approach works in MySQL, PostgreSQL, and SQL Server.

EXISTS uses a correlated subquery because the inner query references columns from the outer query, s1.gpa and s1.id, meaning it executes once per row in the outer query.

7. Filtering With Multiple Columns From a Subquery

So far, we’ve discussed subqueries that return a single column. However, subqueries in the WHERE clause can also return multiple columns, which is useful when we want to filter rows based on a combination of values rather than a single field.

For example, let’s find students who have the same GPA and graduation date as at least one other student:

SELECT name, gpa, graduation_date
FROM Student
WHERE (gpa, graduation_date) IN (
    SELECT gpa, graduation_date
    FROM Student
    WHERE graduation_date IS NOT NULL
    GROUP BY gpa, graduation_date
    HAVING COUNT(*) > 1
);

Here, the inner subquery retrieves all gpa and graduation_date pairs that appear more than once in the Student table.

The outer query returns students whose GPA and graduation date match any of these repeated pairs:

+----------------+------+-----------------+
| name           | gpa  | graduation_date |
+----------------+------+-----------------+
| John Liu       | 4.00 | 2024-06-15      |
| Albert Decosta | 4.00 | 2024-06-15      |
+----------------+------+-----------------+

This query works in both MySQL and PostgreSQL.

7.1. SQL Server Alternative With EXISTS

Since SQL Server doesn’t support multi-column IN clauses, we can rewrite the query using EXISTS:

SELECT s1.name, s1.gpa, s1.graduation_date
FROM Student s1
WHERE EXISTS (
    SELECT 1
    FROM (
        SELECT gpa, graduation_date
        FROM Student
        WHERE graduation_date IS NOT NULL
        GROUP BY gpa, graduation_date
        HAVING COUNT(*) > 1
    ) AS repeated
    WHERE s1.gpa = repeated.gpa
      AND s1.graduation_date = repeated.graduation_date
);

This query checks each student row one by one. For every student in the outer query, it runs the EXISTS check to see if the student’s GPA and graduation date match any of the duplicate pairs from the inner query.

7.2. Behavior

In the first query using IN, the inner subquery is non-correlated because it executes once and returns a set of values used by the outer query.

The second query using EXISTS is a correlated subquery because the inner query depends on the current row of the outer query; it runs for each student to check if they match any pair in the duplicate list.

8. Conclusion

In this article, we discussed how to use columns from a subselect in the WHERE clause in SQL. We looked at scalar subqueries, IN subqueries, and EXISTS subqueries. Furthermore, we explained the difference between correlated and non-correlated subqueries, which affect performance and logic.

Understanding how subselects interact with the WHERE clause enables us to write more flexible, concise, and powerful SQL queries, especially when filtering data based on dynamic or computed conditions.

The queries in this article can be found over on GitHub.