Recursive Queries in MySQL Baeldung on SQL

1. Introduction

In this tutorial, we’ll learn how to write recursive queries in MySQL.

We’ll base our queries on Baeldung’s University schema.

2. Recursion and Queries

Each query outputs a relation, i.e., a table. It’s the result of table operations, such as filtering, joining, and sorting, which are performed on the tables referenced in the query.

However, some result tables denote recursive relations. For example, if users on a social network can reply to each other’s comments, a comment thread will have a recursive structure:

thread = the original comment
          + replies to the original comment
          + replies to the replies to the original comment
          + ...

This is usually the case with hierarchies, where for any row in a table, we can follow a path from a so-called base row to it. For instance, the social circle of a social network user is a hierarchy because, by following friendships between users, we can get the user’s friends, their friends, the friends of their friends, and so on.

3. Recursive Common Table Expressions

In MySQL, we use recursive common table expressions (CTEs) to write recursive queries:

WITH RECURSIVE cte
SELECT columns
FROM ...

We define a recursive CTE using the WITH RECURSIVE construct. It contains a recursive and nonrecursive SELECT statement:

WITH RECURSIVE cte(columns) AS (
    nonrecursive SELECT

    UNION ALL | UNION DISTINCT

    recursive SELECT
)

The nonrecursive SELECT returns the base case of the recursion: the rows that belong to the CTE table because of their inherent properties, not because of their relationships with other CTE rows. This statement doesn’t reference the CTE’s name.

The recursive SELECT references it and finds the rows that belong to the CTE because of their relationships with the CTE rows that have already been included.

CTE recursion proceeds in iterations. First, we find the rows with the nonrecursive SELECT. Then, we include their counterparts, then the counterparts of those counterparts, and so on, until there are no more rows to include.

If we use UNION DISTINCT, duplicate rows will be removed. In contrast, UNION ALL keeps them.

3.1. Example

Let’s find the IDs of all the courses we must pass before registering for CS411. So, the query should return all this course’s direct and indirect prerequisites.

The direct prerequisites are already in the Prerequisite table for CS411. They form the base of our recursion, so we’ll return them in the nonrecursive SELECT:

SELECT prerequisite_id
FROM Prerequisites
WHERE course_id = 'CS411'

How do we find indirect prerequisites? Well, if a course with ID CSxyz is already a prerequisite for CS411, then every course that is a prerequisite for CSxyz is also a prerequisite for CS411:

WITH RECURSIVE required(id) AS (
    SELECT prerequisite_id
    FROM Prerequisite
    WHERE course_id = 'CS411'
    UNION DISTINCT
    SELECT prerequisite_id
    FROM Prerequisite JOIN required
    ON (Prerequisite.course_id = required.id)
)
SELECT * 
FROM required;

Here are the results:

# id
CS211
CS111
CS112

Here, we name the CTE required and define it to have one column: id. In the nonrecursive SELECT statement, we join required with Prerequisite and recursively include the found prerequisites in required.

3.2. Computation

The inclusion of the rows in the recursive SELECT is iterative:

cte = an empty table
rows_to_add = execute the nonrecursive SELECT statement

while rows_to_add is not empty:
    cte = cte + rows_to_add
    rows_to_add = execute the recursive SELECT using rows_to_add

We keep including rows until there are no more rows to add.

To confirm this is the case, let’s print the paths corresponding to the prerequisites:

WITH RECURSIVE required(id, path) AS
(
  SELECT prerequisite_id,
         CONCAT(prerequisite_id, '->', course_id))
  FROM Prerequisite
  WHERE course_id = 'CS411'
  UNION DISTINCT
  SELECT prerequisite_id,
         CONCAT(prerequisite_id, '->', path)
  FROM Prerequisite JOIN required
  ON (Prerequisite.course_id = required.id)
)
SELECT path
FROM required;

Found paths:

# path
CS211->CS411
CS111->CS211->CS411
CS112->CS211->CS411

In this query, we concatenate the prerequisite IDs into a path that shows each prerequisite course’s position in the prerequisite hierarchy for CS411. So, CS211 is a direct prerequisite, as passing it allows us to register for CS411. However, before CS211, we must pass CS111 and CS112.

3.3. Column Types

The resulting CTE inherits the column types from the nonrecursive SELECT. If the type can’t store longer values originating in the recursive part, we’ll get an error, or there will be data loss, depending on whether we use a strict SQL mode.

Let’s consider what would happen if course_id and prerequisite_id were CHAR(5) instead of VARCHAR(10). To change the column types, we first drop the foreign key constraints:

ALTER TABLE Prerequisite
DROP FOREIGN KEY prerequisite_course_id_fkey,
DROP FOREIGN KEY prerequisite_prerequisite_id_fkey;

ALTER TABLE Prerequisite
MODIFY COLUMN prerequisite_id CHAR(5),
MODIFY COLUMN course_id CHAR(5);

Now, the paths in the nonrecursive SELECT will have the type CHAR(12), and for CS411, there’s only one such path: CS211->CS411. However, the paths produced in the recursive part are longer:

CS111->CS211->CS411
CS112->CS211->CS411

In a nonstrict SQL mode, they’ll be truncated to fit the type determined in the nonrecursive part (CHAR(12)):

# path
CS111->CS211
CS112->CS211
CS211->CS411

In a strict SQL mode, we’ll get:

Error Code: 1406. Data too long for column 'path' at row 1

We can fix this by casting the corresponding column in the nonrecursive SELECT to a large enough type:

WITH RECURSIVE required(id, path) AS
(
  SELECT prerequisite_id,
  CAST(CONCAT(prerequisite_id, '->', course_id) AS CHAR(50))
  FROM Prerequisite
  WHERE course_id = 'CS411'
  UNION DISTINCT
  SELECT prerequisite_id, CONCAT(prerequisite_id, '->', path)
  FROM Prerequisite JOIN required
  ON (Prerequisite.course_id = required.id)
)
SELECT path
FROM required;

# path
CS211->CS411
CS111->CS211->CS411
CS112->CS211->CS411

3.4. Column Names

We can set the column names of a recursive CTE inline, right after the CTE name following the keyphrase WITH RECURSIVE:

To confirm this is the case, let’s print the paths (to CS411) defined by each prerequisite:

WITH RECURSIVE required(id, path) AS
...

Another option is to set the names in the nonrecursive SELECT:

WITH RECURSIVE required AS
(
  SELECT prerequisite_id AS id,
         CONCAT(prerequisite_id, '->', course_id)) AS path
...

3.5. Constraints

The recursive SELECT statement has several limitations.

Aggregating the results is forbidden. So, we can’t use SUM(), COUNT(), and similar functions. This also applies to GROUP BY.

We can use no window functions.

We mustn’t sort the results, so no ORDER BY.

Pruning duplicates isn’t allowed. Therefore, we can’t use DISTINCT (as in SELECT DISTINCT) in the recursive SELECT.

Finally, any referenced CTE must be defined before the referencing CTE. This means that we can use a previously defined CTE but can’t have mutually recursive CTEs.

4. Set Recursion Limits

We can limit the recursion depth by setting the cte_max_recursion_depth variable to the maximal acceptable limit. The engine aborts the queries exceeding the threshold.

Similarly, we can use the LIMIT clause in the recursive SELECT statement. It applies to the entire CTE (not just one iteration) and will stop the generation of the CTE when we hit the limit:

WITH RECURSIVE required(id, path) AS
(
  SELECT prerequisite_id,
  CAST(CONCAT(prerequisite_id, '->', course_id) AS CHAR(50))
  FROM Prerequisite
  WHERE course_id = 'CS411'
  UNION DISTINCT
  SELECT prerequisite_id, CONCAT(prerequisite_id, '->', path)
  FROM Prerequisite JOIN required
  ON (Prerequisite.course_id = required.id)
  LIMIT 2
)
SELECT path
FROM required;

The results are as expected:

# path
CS211->CS411
CS111->CS211->CS411

Alternatively, we can set the LIMIT in the query that selects from the CTE which we define in the WITH clause. However, using LIMIT within the CTE definition (in the nonrecursive part) is more efficient since we can stop the CTE generation when the engine produces LIMIT rows.

5. Conclusion

In this article, we showed how to write a recursive query in MySQL.

Recursive queries are defined using recursive common table expressions. They reference themselves in their definitions. We compute them iteratively, starting with the base case, until we can add no more rows.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Full Archive

About Baeldung