SQL SELECT DISTINCT

The SQL Server DISTINCT keyword retrieves or returns unique records (by removing the duplicates) from the specified column in the SELECT Statement. By this, it ensures each record in the result set is unique and different from others. When there are repeated values or duplicate information in any column, to retrieve the unique values from it, use the SELECT statement with the DISTINCT keyword.

For instance, a product table may contain thousands of products, each with distinct information. The color or size details will be around 10 unique values. However, those 10 values will repeat in the Color or Size columns for all the records. When you try to select the color field, it displays the repeated information. In such a case, use the SQL SELECT DISTINCT statement to return only those 10 unique records in the color or size field.

The following are some key things to remember.

It helps to remove duplicates from the result set.
Use when the uniqueness is important.
We can use DISTINCT on a single column or multiple columns.
We can use SELECT DISTINCT on a single table or multiple tables.

SQL SELECT DISTINCT Syntax

The syntax of the SELECT DISTINCT statement to return the unique records in a column(s) is as shown below.

SELECT DISTINCT Column1, Column2,…, ColumnN
FROM Source
WHERE Conditions -- This is Optional

As you can see from the above syntax, all you have to do is add the SQL DISTINCT keyword to the regular SELECT statement.

Column1,…, ColumnN: The SELECT DISTINCT statement allows us to pick the required number of columns from the tables. It may be one or more.
Source: It is a destination table from where you want to pick the unique columns. It may involve one or more tables in the Database, and JOINS are used to connect multiple tables in SQL Server.
Conditions: It is optional to use the WHERE clause. It helps to filter the result set.

NOTE: Apart from the above basic syntax, we can add the TOP clause, GROUP BY, ORDER BY, etc.

SQL SELECT DISTINCT Statement Examples

The DISTINCT keyword can be used on a single column or multiple columns to display the unique values in those columns. If you use it on a single column, it shows the distinct or unique records from that column. However, if you use it on multiple columns, it returns the unique combination of those two columns.

To demonstrate the SELECT DISTINCT statement, we use the tables from the AdventureWorks2022 database. We start this section with a simple SQL SELECT statement without any DISTINCT keyword and then explain the importance of unique records. Then, we will cover the advanced topics, including GROUPING, WHERE, etc.

NOTE: The DISTINCT considers the NULL records as valid, distinctive records. So, please use any NOT NULL function to remove NULLS.

SQL SELECT DISTINCT on a Single Column

In general, when you write a SELECT statement, it returns all the records in a table for that particular column. For example, the query below returns 10 records under the Group column. It is because in a table, each row is a unique combination of all columns. So, when you write a basic SELECT statement like the one below, it returns 10 records because there are 10 rows in the SalesTerritory table.

SELECT [Group] FROM [Sales].[SalesTerritory]

Group
---------
North America
North America
North America
North America
North America
North America
Europe
Europe
Pacific
Europe

There are situations where we need the unique values within a column. For example, Departments in an organization, Categories, Countries, etc. In such a case, use the SQL SELECT statement with the DISTINCT keyword to remove duplicates and return unique values.

For instance, if we add a DISTINCT keyword to the above query, it will return only three rows. Because there are only three unique rows in the SalesTerritory table, they are categorized into the Europe, North America, and Pacific groups. These three records are repeated for those 10 rows.

SELECT DISTINCT [Group]
FROM [Sales].[SalesTerritory]

Group
-------
Europe
North America
Pacific

SQL SELECT DISTINCT on Multiple Columns

As we mentioned in the previous example, using the DISTINCT keyword on a single column returns unique values in that column. However, the real confusion comes when we apply it to multiple columns.

When we use SELECT DISTINCT on multiple columns, the query will return the unique combinations of those two (multiple) columns. To explain this, we use the Employee table. The following general SELECT statement returns 290 records.

SELECT [Gender], [MaritalStatus]
FROM [HumanResources].[Employee]

(290 rows affected)

If we add the SQL DISTINCT keyword to the above multiple columns, the SELECT statement will return four rows. There are only two Genders in the AdventureWorks database Employee table. Why does it display four records as the output?

Male (M) and Female (F) are the unique records from the Gender column.
Married (M) and Single (S) are the unique records from the MaritalStatus column.
The unique combination of the Gender and MaritalStatus columns is:
- Female – Married
- Female – Single
- Male – Married
- Male – Single.

SELECT DISTINCT [Gender], [MaritalStatus]
FROM [HumanResources].[Employee]

Gender	MaritalStatus
F	M
F	S
M	M
M	S

If you add one more column to the above, the SQL SELECT DISTINCT Statement will display the unique combination of the three columns. Let me add SalariedFlag to the above query. There are two unique numbers (1 and 0) in the SalariedFlag column. The unique combination of three (2 * 2 * 2) is a total of eight columns.

Female – Married – 0
Female – Married – 1
Female – Single – 0…..

SELECT DISTINCT [Gender], [MaritalStatus], [SalariedFlag]
FROM [HumanResources].[Employee]

Gender	MaritalStatus	SalariedFlag
F	M	0
F	M	1
F	S	0
F	S	1
M	M	0
M	M	1
M	S	0
M	S	1

Using the ORDER BY clause

The SQL Server SELECT DISTINCT statement allows you to use the ORDER BY clause to sort the records in ascending or descending order. By default, the above query returns the result set based on the table order. However, if you add the last line, the ORDER BY clause will sort the Employees based on their marital status in ascending order.

SELECT DISTINCT [Gender], [MaritalStatus]
FROM [HumanResources].[Employee]
ORDER BY MaritalStatus

Gender	MaritalStatus
F	M
M	M
F	S
M	S

SQL SELECT DISTINCT with WHERE clause

In this example, we will use the SELECT DISTINCT statement with the WHERE Clause to filter the unique values in a column.

The DimGeography table has a total of 655 records, and within those records, 65 rows belong to Germany. Here, we used the DISTINCT keyword, so out of 65, there are only six distinct state province names that belong to Germany. Here, the FROM clause brings all records in the DimGeography table. Next, the WHERE clause filters the table and removes all rows except those where the country name is Germany. Next, the SQL SELECT DISTINCT keyword picks the unique State and Country combinations from the filter data.

SELECT DISTINCT [StateProvinceName], EnglishCountryRegionName
FROM [DimGeography]
WHERE EnglishCountryRegionName = 'Germany'

StateProvinceName	EnglishCountryRegionName
Bayern	Germany
Brandenburg	Germany
Hamburg	Germany
Hessen	Germany
Nordrhein-Westfalen	Germany
Saarland	Germany

Similarly, the following statement will return the unique combinations of Education and yearly Income from the DimCustomer table.

There are almost 18484 rows in the EnglishEducation column of the DimCustomer table. The WHERE clause filters the customers whose Income is greater than 160000.

SELECT DISTINCT [EnglishEducation]
      ,[YearlyIncome]
FROM [DimCustomer]
WHERE YearlyIncome > 160000

EnglishEducation	YearlyIncome
Bachelors	170000.00
Graduate Degree	170000.00
High School	170000.00
Partial College	170000.00
Partial High School	170000.00

If you want to use the AdventureWorks database, use the query below to display the distinct employees’ Job titles whose Organisation Level equals 1.

SELECT DISTINCT JobTitle 
FROM [HumanResources].[Employee]
WHERE OrganizationLevel = 1

JobTitle
-------------
Chief Financial Officer
Information Services Manager
Marketing Manager
Vice President of Engineering
Vice President of Production
Vice President of Sales

SQL SELECT DISTINCT with GROUP BY Clause

The SELECT DISTINCT statement also allows us to use the GROUP BY Clause to combine the common items in a column and apply aggregate Functions on each group. Combining the SELECT DISTINCT statement with the aggregate functions can help perform calculations on unique values. However, it is useless to use MAX or MIN functions with DISTINCT because the result will be the same. When it comes to SUM, AVG, and COUNT functions, the DISTINCT keyword excels.

The following unique query shows how we can use the SQL SELECT DISTINCT statement with the GROUP BY clause. If you observe the query, we have written the DISTINCT keyword inside the aggregate function instead of writing it after the SELECT statement.

The query below groups the SalesOrderHeader table by TerritoryID and uses the aggregate SUM and AVG functions to calculate the sum and average sales of the customers. To show the difference in the output, we used the regular SUM() function and the SUM() function with the DISTINCT keyword.

SUM(TotalDue) calculates the sum of sales for all the customers in the SalesOrderHeader table.
SUM(DISTINCT TotalDue): It calculates the total sales for distinct customers in each Territory.

SELECT TerritoryID,
    SUM(TotalDue) AS CustomerSales,
    SUM(DISTINCT TotalDue) AS UniqueCustomerSales,
    AVG(DISTINCT TotalDue) AS UniqueAverageSales
FROM Sales.SalesOrderHeader
GROUP BY TerritoryID
ORDER BY TerritoryID

SQL SELECT DISTINCT with HAVING Clause

There is an optional HAVING clause that we can use with the combination of the SELECT DISTINCT GROUP BY statement. The HAVING clause can filter the aggregations performed on unique values in each group. To demonstrate the DISTINCT keyword with the HAVING clause, we use the same query as the GROUP BY example. Next, add the HAVING SUM(DISTINCT TotalDue) > 10000000 line to it. This line checks whether the total unique customers’ sales are greater than 10000000. If TRUE, add that row to the result set.

SELECT TerritoryID,
    SUM(TotalDue) AS CustomerSales,
    SUM(DISTINCT TotalDue) AS UniqueCustomerSales,
    AVG(DISTINCT TotalDue) AS UniqueAverageSales
FROM Sales.SalesOrderHeader
GROUP BY TerritoryID
HAVING SUM(DISTINCT TotalDue) > 10000000
ORDER BY TerritoryID

Similarly, the following SQL SELECT DISTINCT query calculates the total unique sales for each product. Here, we used two tables using the INNER JOIN and grouped the customer orders by ProductID. Next, we used the SUM() and DISTINCT keywords to calculate the total unique sales for each product.

SELECT ProductID,
       SUM(DISTINCT TotalDue) AS UniqueOrdersSales
FROM Sales.SalesOrderHeader h
JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
GROUP BY ProductID
HAVING SUM(DISTINCT TotalDue) > 40000000
ORDER BY UniqueOrdersSales

SQL SELECT DISTINCT With COUNT function

The SELECT DISTINCT COUNT statement is one of the most commonly asked questions. This section helps you understand the COUNT DISTINCT with multiple examples.

In general, the COUNT() function counts the total number of records in a column. If we use the COUNT() Function on any column, it displays the total rows. However, if we add the DISTINCT keyword to the SELECT statement with the COUNT function, it will count the unique items in that column.

The following query will return 266 as the output because there are only 266 distinct rows in the ProductID column. However, if we remove the SQL DISTINCT keyword, the SELECT statement will return 121317 because the SalesOrderDetail table has 121317 rows.

SELECT COUNT(DISTINCT ProductID) AS UniqueProductsSale
FROM Sales.SalesOrderDetail

Apart from the above, we can combine the COUNT DISTINCT with the GROUP BY clause to find the total number of unique items in each group. For example, the following query calculates the total number of unique customers and sales persons who have done business in each year (2011, 2012, 2013, and 2014).

SELECT YEAR([OrderDate]) AS OrderYear
    ,COUNT(DISTINCT [CustomerID]) AS UniqueCustomers
    ,COUNT(DISTINCT [SalesPersonID]) AS UniqueSalesPersons
FROM [Sales].[SalesOrderHeader]
GROUP BY YEAR([OrderDate])

Using SQL SELECT DISTINCT Statement with JOIN clause

We can use the JOIN clause to combine data from multiple tables, and the SELECT DISTINCT statement will extract only the unique values from those tables. To perform any aggregation or get meaningful information, we use the JOIN clause to combine two or more tables. Next, the DISTINCT keyword will find the unique combinations among them.

The following query uses the INNER JOIN to combine the Sales Order Details, Header, and the Territory tables. Next, the GROUP BY clause groups the three tables by Territory Name. The COUNT() and SUM() functions with the SQL SELECT DISTINCT keyword calculate the total unique products, line total, and distinct sales amount in each Territory.

SELECT s.Name AS TerritoryName,
    COUNT(DISTINCT sd.ProductID) AS ProductCount,
    SUM(DISTINCT sd.LineTotal) AS LineTotal,
    SUM(DISTINCT oh.TotalDue) AS UniqueTotal
FROM Sales.SalesOrderDetail sd 
JOIN Sales.SalesOrderHeader oh
    ON sd.SalesOrderID = oh.SalesOrderID
JOIN Sales.SalesTerritory s
    ON oh.TerritoryID = s.TerritoryID
GROUP BY s.Name
ORDER BY s.Name;

TerritoryName	ProductCount	LineTotal	UniqueTotal
Australia	187	317288.490105	2733140.0922
Canada	262	2038863.222795	16676700.0255
Central	253	1451600.793971	8884063.97
France	247	1293266.298952	5763830.3609
Germany	219	565688.661230	2885239.32
Northeast	246	1203133.389904	7808881.7864
Northwest	263	1900579.875384	14667269.468
Southeast	253	1378632.826406	8855220.0138
Southwest	262	2197930.423594	21672580.2512
United Kingdom	254	1150423.105047	5477108.6564

How DISTINCT keyword treat NULL values?

The SQL SELECT DISTINCT statement considers NULL as a distinct value. If we use the DISTINCT statement on any column with NULL values, it will display all the unique values in that column and one NULL row. It treats NULL values as a single unique entry, and to represent all NULL values in that column, it displays one as the output.

In the Person table, there are almost 18997 NULL values in the Title column. The query below prints seven records as the output. In those seven, there is a NULL row, which represents the 18997 NULL values.

SELECT DISTINCT [Title]
FROM [Person].[Person]

Title
-----
Sr.
Mrs.
NULL
Sra.
Ms.
Ms
Mr.

SQL SELECT DISTINCT VS UNIQUE

By definition or seeing the words, both DISTINCT and UNIQUE have a similar meaning. However, they are both different and serve different purposes. The DISTINCT keyword is used with the combination of the SQL SELECT statement, whereas the UNIQUE constraint is declared in the TABLE definition.

DISTINCT	UNIQUE
DISTINCT is a keyword used after the SELECT statement.	UNIQUE is a constraint (key) used while creating a table or on an existing table.
It returns the result set with distinct values.	It will not return any value.
It has nothing to do with the data insertion.	It restricts the user from inserting duplicate values into the UNIQUE constraint column.
The DISTINCT keyword has no role while updating the existing data.	The UNIQUE key checks whether the updated value is distinct from the other values in that column.
We must use the SQL DISTINCT keyword in the SELECT statement to return unique records in a column.	We can’t use the UNIQUE key in the SELECT statement.
The DISTINCT keyword is query-specific. If you want distinct records, use in SELECT. Otherwise, don’t use it.	Once we add the UNIQUE key, it will ensure the uniqueness of the records stored. It will not be removed from a table until we manually DROP the CONSTRAINT.

TIP: There is no need to use the DISTINCT keyword on columns with a UNIQUE constraint. Because the key itself won’t allow any duplicate data insertion, so no need to explicitly use the DISTINCT keyword.

SQL SELECT DISTINCT Best practices

When working with the SELECT DISTINCT statement, remember the following list of things for optimal performance.

Always use the necessary columns in the SELECT DISTINCT statement. As the columns increase, the unique combinations increase.
Use the WHERE clause to filter the table data before the DISTINCT execution. In more complex calculations, use the subqueries along with DISTINCT.
If possible, use the index columns in the SELECT DISTINCT to improve the performance.
Don’t confuse the DISTINCT with the GROUP BY clause; they are entirely different and each has its own purpose.
Don’t use the DISTINCT keyword with the MIN and MAX to find the minimum and maximum values. As they are single values, we don’t need the DISTINCT keyword.
The SQL SELECT DISTINCT statement considers NULLs as unique entries and returns a single NULL as the output, and it represents all the NULL values in that column.
There are alternatives, including GROUP BY, EXISTS, the UNION operator, and so on. It all depends on the requirement and use them accordingly.