Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: July 18, 2024
Counting distinct values in a SQL column is essential for database administrators and data analysts to find out how many different categories are in a column. Further, it helps us maintain data integrity efficiently and ensures accurate reporting and analysis.
In this tutorial, we’ll discuss various methods to efficiently count distinct values, with the appropriate examples.
Counting unique values in a SQL column is straightforward with the DISTINCT keyword. Here, let’s see how to effectively count distinct entries and apply filters for more specific data insights.
Let’s take the sample data that lists eight employees, detailing their IDs, names, and departments. The departments include HR, Finance, IT and Marketing, with multiple employees in all departments:
> SELECT * FROM Employees;
id | name | department
----+---------+------------
1 | Alice | HR
2 | Bob | Finance
3 | Charlie | HR
4 | David | IT
5 | Eve | Finance
6 | Frank | IT
7 | Grace | HR
8 | Heidi | Marketing
(8 rows)
Here, the SELECT keyword retrieves all rows and columns from the Employees tables, displaying complete records of every employee in the database without filtering or aggregation. It’s a straightforward query commonly used to view the entire dataset in a specific table for analysis.
Here, the SQL query counts the number of distinct departments in the Employees table. The DISTINCT keyword ensures unique department names are considered:
> SELECT COUNT(DISTINCT department) AS uniq_dept FROM Employees;
uniq_dept
-----------
4
(1 row)
In this example, the count of distinct departments is returned as 4, providing insight into the diversity of departments within the dataset.
Even though the department name appears multiple times in the table, each name is only counted once. Subsequently, the result is labeled as uniq_dept for our quick reference.
Lastly, the FROM keyword indicates that the data is from the Employees table.
Additionally, we can use GROUP BY with COUNT for more detailed analysis to provide the count of unique values within specific groups.
Let’s see how to organize our data into different groups and count unique entries within those groups. It’s like sorting our toys into boxes and counting how many different types we’ve in each box:
> SELECT department, COUNT(*) FROM Employees GROUP BY department;
department | count
------------+-------
Marketing | 1
Finance | 2
IT | 2
HR | 3
(4 rows)
Consequently, this command retrieves each unique department from the Employees table and counts the number of occurrences for each. Further, the GROUP BY organizes the data by department, providing a breakdown of how many employees belong to each department.
Now, let’s sort the output based on the count in decreasing order:
> SELECT department, COUNT(*) FROM Employees GROUP BY department ORDER BY count DESC;
department | count
------------+-------
HR | 3
Finance | 2
IT | 2
Marketing | 1
(4 rows)
Here, we’ll retrieve each unique department from the Employees table, and then count the occurrences for each group using COUNT(*). Next, the GROUP BY organizes the results by department. Finally, this command sorts the departments by count in descending order with ORDER BY count DESC.
In summary, understanding how to count distinct values in SQL is crucial for thorough data analysis. The DISTINCT keyword helps us to identify unique entries.
Further advanced techniques like combining GROUP BY with COUNT allow for more in-depth analysis. These strategies empower users to manage and analyze data effectively, revealing valuable insights for making decisions in today’s data-driven world.