How To Count Distinct Over Multiple Columns Baeldung on SQL

1. Overview

Counting with the DISTINCT clause is one of the most fundamental statements in SQL for aggregating data. It helps us quickly get a summary from hundreds or even millions of rows of tabular data.

While it’s fairly common to use COUNT and DISTINCT on a single column, there are occasions when we need to apply DISTINCT to multiple columns and count the number of combinations that occur in the data.

In this tutorial, we’ll look at different ways to use DISTINCT over multiple columns.

2. The Direct Approach

Let’s work with the University schema and get a count of the unique combinations of the national_id and position columns in the Faculty table. Before that, for context, let’s get the distinct counts for national_id and position separately.

2.1. Count/Distinct on Single Columns

We can count the number of distinct national_id entries with this query:

SELECT COUNT(DISTINCT national_id)
FROM Faculty; 

count
------- 
    72 
(1 row)

We can also count the number of distinct position entries with this similar query:

SELECT COUNT(DISTINCT position)
FROM Faculty; 

count
------- 
     4 
(1 row)

Next, we’ll try to count the number of unique combinations of these two columns.

2.2. Using a Sub-Query for Multiple Columns

Most SQL dialects don’t let us apply COUNT and DISTINCT together on multiple columns. Therefore, we must use a subquery:

SELECT COUNT(*)
FROM (
  SELECT DISTINCT national_id, position 
  FROM Faculty
) AS subQuery; 

count
-------
    72
(1 row)

We created a subquery using the DISTINCT keyword on two columns and then used COUNT on the result.

2.3. Using Count and Distinct With Two Columns Directly

Only MySQL gives us the option of specifying two column names instead of one with COUNT/DISTINCT. This makes the query very straightforward:

SELECT COUNT(DISTINCT national_id, position)
FROM Faculty;

COUNT(DISTINCT national_id, position)
72

The query we ran was short and elegant, but the downside is that it works only with MySQL. The syntax is invalid for PostgreSQL and SQL Server. For these dialects, we’ll need to condense multiple columns into one to apply COUNT/DISTINCT.

3. Using Concat

To avoid the subquery and still have our query work across dialects, we can only have one column to run DISTINCT on. But we specifically need the count of distinct combinations for two columns here. Another way to reduce our dataset to one column is to use a computed column from the two columns.

CONCAT works perfectly for joining the two columns as a single string while maintaining the combined distinctness we’re looking for:

SELECT COUNT(DISTINCT CONCAT(national_id, position))
FROM Faculty;

count
-------
    72
(1 row)

We used the CONCAT function to combine the national_id and position columns into one column before using DISTINCT on the computed column. The result is still 72, as expected.

We didn’t add a separator between the two columns in this particular case, but we could add one if required. Separators help when we need to differentiate where one column ends and the other starts. This could happen when both columns are numeric (‘1′ + ’11’ = ’11’ + ‘1’) or for other reasons. Therefore, to overcome losing unique values, we can add a column separator in the CONCAT function: CONCAT(national_id, ‘:’, position).

4. Conclusion

In this tutorial, we looked at different methods to use COUNT and DISTINCT together on multiple columns combined. Only MySQL supports doing this directly on two columns. For wider support, we had to use either a subquery or a function to make a single column from multiple columns. We used the CONCAT function as it works best across various SQL implementations.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Full Archive

About Baeldung