Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Overview

When using SQL joins, it’s common to come across cases where joining two or more tables returns multiple rows for each match.

In this tutorial, we’ll explore how to return a single row from a join onto a table with a one-to-many relationship with the table we’re querying.

2. Example Database

To demonstrate, we’ll be using an example database in PostgreSQL containing the table countries:

+---------------------------+
| country_id | country_name |
+------------+--------------+
| 1          | "USA"        |
| 2          | "UK"         |
| 3          | "France"     |
| 4          | "Thailand"   |
+---------------------------+

The database also contains a second table named cities, which will have a many-to-one relationship with the countries table:

+--------------------------------------------------+
| city_id | country_id | city_name   | population  |
+---------+------------+-------------+-------------+
| 1       | 1          | "Washington"| 689545      |
| 2       | 1          | "New York"  | 8804190     |
| 3       | 2          | "London"    | 12208100    |
| 4       | 2          | "Manchester"| 2732854     |
| 5       | 3          | "Paris"     | 2048472     |
| 6       | 3          | "Nice"      | 353701      |
| 7       | 4          | "Bangkok"   | 5588222     |
| 8       | 4          | "Phuket"    | 77778       |
+--------------------------------------------------+

The cities table has country_id as a foreign key.

3. Joining Tables

We can join the countries table to the cities table to return all cities joined with the country they’re in:

SELECT countries.country_id, country_name, city_id, city_name, population
FROM countries
JOIN cities ON countries.country_id = cities.country_id;

This join returns:

+-----------------------------------------------------------------+
| country_id | country_name | city_id | city_name   | population  |
+------------+--------------+---------+-------------+-------------+
| 1          | "USA"        | 1       | "Washington"| 689545      |
| 1          | "USA"        | 2       | "New York"  | 8804190     |
| 2          | "UK"         | 3       | "London"    | 12208100    |
| 2          | "UK"         | 4       | "Manchester"| 2732854     |
| 3          | "France"     | 5       | "Paris"     | 2048472     |
| 3          | "France"     | 6       | "Nice"      | 353701      |
| 4          | "Thailand"   | 7       | "Bangkok"   | 5588222     |
| 4          | "Thailand"   | 8       | "Phuket"    | 77778       |
+-----------------------------------------------------------------+

We can see that the query has successfully returned the country name for each city we have in the cities table. Depending on our use case, this may suffice. However, let’s say we want to return the most populated city for each country. So, how do we limit the join to return a single city record?

4. Limiting Returned Rows Using DISTINCT ON

The DISTINCT ON clause can be used to limit a query to return unique values. Let’s use DISTINCT ON to limit the number of cities returned in our previous query:

SELECT DISTINCT ON(countries.country_id) 
    countries.country_id, country_name, city_id, city_name, population 
FROM countries
JOIN cities ON countries.country_id = cities.country_id;

This query returns the following result:

+-----------------------------------------------------------------+
| country_id | country_name | city_id | city_name   | population  |
+------------+--------------+---------+-------------+-------------+
| 1          | "USA"        | 1       | "Washington"| 689545      |
| 2          | "UK"         | 3       | "London"    | 12208100    |
| 3          | "France"     | 5       | "Paris"     | 2048472     |
| 4          | "Thailand"   | 7       | "Bangkok"   | 5588222     |
+-----------------------------------------------------------------+

This result is much better for our use case since we’ve limited country_id to be unique using DISTINCT ON. However, the result set contains Washington for the USA, whereas we need to display New York since the population is higher.

5. Using DISTINCT ON With ORDER BY

Now we have a query returning a single result per country record. We need to use ordering to return the city with the highest population. So, let’s modify our query:

SELECT DISTINCT ON(countries.country_id) 
    countries.country_id, country_name, city_id, city_name, population
FROM countries
JOIN cities ON countries.country_id = cities.country_id
ORDER BY countries.country_id, population DESC;

It’s worth noting that when ordering a query that includes a DISTINCT ON column, that column needs to be included in the ORDER BY statement. Now that we’ve ordered by population in descending order, we get the following result:

+-----------------------------------------------------------------+
| country_id | country_name | city_id | city_name   | population  |
+------------+--------------+---------+-------------+-------------+
| 1          | "USA"        | 2       | "New York"  | 8804190     |
| 2          | "UK"         | 3       | "London"    | 12208100    |
| 3          | "France"     | 5       | "Paris"     | 2048472     |
| 4          | "Thailand"   | 7       | "Bangkok"   | 5588222     |
+-----------------------------------------------------------------+

This resolves the issue from the previous section, as we now have New York rather than Washington as the city with the higher population in the USA.

6. Conclusion

In this article, we’ve explored how to adapt a join to return a unique value for a chosen column using DISTINCT ON. Expanding on that, we used ORDER BY to determine which row from the table should be included in the result set. We made sure to include the DISTINCT ON column in the ORDER BY statement.

As always, the sample code used in this article is available over on GitHub.