Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: September 7, 2024
In SQL, joining the same table is synonymous with joining the table to itself, a technique known as a self-join. Essentially, this technique empowers comparisons between rows within the same table.
In this tutorial, we’ll explore how to perform self-joins, understand their use cases, and provide practical examples.
To begin with, a self-join is a regular JOIN, but the table is joined with itself. In addition, this concept relies heavily on using table aliases to differentiate the instances of the table.
Now, let’s review a common syntax that involves joining the table using an alias and then applying the join condition:
SELECT x.column1, y.column2
FROM Table_name x
JOIN Table_name y
ON x.common_field = y.common_field;
So, this SQL query selects column1 from one instance of Table_name (aliased as x) and column2 from another instance of Table_name (aliased as y), where the values in common_field are the same in both instances of the table.
Here, this simple syntax of self-join retrieves related information from a single table or from two instances of the same table based on some common criteria.
Before moving to specific use cases, let’s use the syntax we mentioned in the previous section on the table named faculty from the Baeldung schema:
SELECT x.name, x.position, x.national_id, y.name, y.position, y.national_id
FROM Faculty x
JOIN Faculty y ON x.id = y.id;
Here’s the result:
name | position | national_id | name | position | national_id
-----------------+---------------------+-------------+-----------------+---------------------+-------------
Anubha Gupta | Professor | 1018901231 | Anubha Gupta | Professor | 1018901231
Anubha Gupta | Associate Professor | 1018901231 | Anubha Gupta | Associate Professor | 1018901231
Anubha Gupta | Assistant Professor | 1018901231 | Anubha Gupta | Assistant Professor | 1018901231
Peter Pan | Professor | 2130989011 | Peter Pan | Professor | 2130989011
Peter Pan | Associate Professor | 2130989011 | Peter Pan | Associate Professor | 2130989011
As shown above, the query used two instances of the same Faculty table to self-join. Now, the table had duplicate columns, each picked from different instances.
As such, the query above increased the columns on the same table by joining them twice.
Let’s consider the common use cases for self-joins.
In hierarchical data, each row is related to another row in the same table; for example, organizational and extended family datasets. Now, let’s explore the use case on the Faculty table to track the career progression of faculty members:
SELECT
f1.name AS faculty_name,
f1.position AS current_position,
f1.start_date AS current_start_date,
f1.end_date AS current_end_date,
f2.position AS previous_position,
f2.start_date AS previous_start_date,
f2.end_date AS previous_end_date
FROM
Faculty f1
LEFT JOIN
Faculty f2
ON
f1.national_id = f2.national_id
AND f1.start_date > f2.end_date
ORDER BY
f1.name, f1.start_date;
Let’s take a look at the result:
faculty_name | current_position | current_start_date | current_end_date | previous_position | previous_start_date | previous_end_date
-----------------+---------------------+--------------------+------------------+---------------------+---------------------+-------------------
Ajit Singh | Teaching Assistant | 2009-05-11 | 2012-05-10 | | |
Ajit Singh | Assistant Professor | 2012-05-11 | 2024-06-08 | Teaching Assistant | 2009-05-11 | 2012-05-10
Anubha Gupta | Assistant Professor | 2004-05-11 | 2007-01-08 | | |
Anubha Gupta | Associate Professor | 2007-01-09 | 2010-01-10 | Assistant Professor | 2004-05-11 | 2007-01-08
Anubha Gupta | Professor | 2010-01-11 | 2027-03-11 | Assistant Professor | 2004-05-11 | 2007-01-08
Anubha Gupta | Professor | 2010-01-11 | 2027-03-11 | Associate Professor | 2007-01-09 | 2010-01-10
AV Subramanium | Assistant Professor | 2011-05-11 | | | |
Badrinath Ho | Professor | 2016-03-11 | | | |
Ballu Singh | Teaching Assistant | 2022-12-11 | 2024-02-02 | | |
Now, let’s proceed to explain the query and review the result. So, we queried the Faculty table by creating two instances using aliases represented by f1 and f2. Furthermore, we used the SELECT clause to pick the columns we needed from each instance of the table.
Subsequently, the LEFT JOIN clause is applied to the second instance of the Faculty table with ON clause to specify conditions that the table be joined on national_id columns. The condition is that f1.start_date (current_start_date) be greater than f2.end_date (previous_end_date).
As shown in the result, we’re able to join a single table twice and apply conditions to perform the task of obtaining the career progression of the faculty member based on their current and previous jobs.
Next, let’s use self-joins to find duplicate values within our table by joining the table on the column that needs to be checked for duplicates.
Let’s proceed to join the table by national_id and exclude rows with the same id from joining with themselves:
SELECT
f1.id AS id1,
f1.name AS name1,
f1.national_id AS national_id1,
f1.position AS position1,
f2.id AS id2,
f2.name AS name2,
f2.national_id AS national_id2,
f2.position AS position2
FROM
Faculty f1
JOIN
Faculty f2
ON
f1.national_id = f2.national_id
AND f1.id <> f2.id
ORDER BY
f1.national_id, f1.start_date;
Here’s the result of the query:
id1 | name1 | national_id1 | position1 | id2 | name2 | national_id2 | position2
-----+-----------------+--------------+---------------------+-----+-----------------+--------------+---------------------
134 | Wlliam Liu | 231456789 | Teaching Assistant | 131 | Wlliam Liu | 231456789 | Professor
134 | Wlliam Liu | 231456789 | Teaching Assistant | 133 | Wlliam Liu | 231456789 | Assistant Professor
134 | Wlliam Liu | 231456789 | Teaching Assistant | 132 | Wlliam Liu | 231456789 | Associate Professor
133 | Wlliam Liu | 231456789 | Assistant Professor | 134 | Wlliam Liu | 231456789 | Teaching Assistant
133 | Wlliam Liu | 231456789 | Assistant Professor | 132 | Wlliam Liu | 231456789 | Associate Professor
133 | Wlliam Liu | 231456789 | Assistant Professor | 131 | Wlliam Liu | 231456789 | Professor
132 | Wlliam Liu | 231456789 | Associate Professor | 131 | Wlliam Liu | 231456789 | Professor
132 | Wlliam Liu | 231456789 | Associate Professor | 134 | Wlliam Liu | 231456789 | Teaching Assistant
132 | Wlliam Liu | 231456789 | Associate Professor | 133 | Wlliam Liu | 231456789 | Assistant Professor
131 | Wlliam Liu | 231456789 | Professor | 133 | Wlliam Liu | 231456789 | Assistant Professor
131 | Wlliam Liu | 231456789 | Professor | 134 | Wlliam Liu | 231456789 | Teaching Assistant
131 | Wlliam Liu | 231456789 | Professor | 132 | Wlliam Liu | 231456789 | Associate Professor
The table shows the pairwise combinations of positions, confirming that the same national_id is associated with different id and position values.
This demonstrates how the same individual has moved through various roles within the institution, providing a clear example of how to detect duplicate values in SQL by joining the same table.
In this article, we’ve explored joining the same table twice in SQL, which is powerful for comparing rows within the same table, querying hierarchical data, and detecting duplicates.
So, by mastering this, we can effectively manage and analyze complex data relationships within a single table.