Baeldung Pro – SQL – NPI EA (cat = Baeldung on SQL)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

Understanding the differences of COUNT(*), COUNT(1), COUNT(column), and COUNT(DISTINCT) in SQL is crucial for optimizing queries effectively, as each serves distinct purposes in data aggregation and analysis. COUNT(*) and COUNT(1) are fundamental for counting all rows efficiently, while COUNT(column) focuses on non-NULL values in specific columns. COUNT(DISTINCT) is essential for identifying unique values and influencing query performance on varying dataset sizes.

In this article, let’s explore these functions in detail. This includes examples and performance considerations that assist database administrators in querying diverse analytical tasks and datasets.

2. Understanding COUNT(*) vs COUNT(1) vs COUNT(column) vs COUNT(DISTINCT column)

Now, let’s delve into the different uses of COUNT(*), COUNT(1), COUNT(column), and COUNT(DISTINCT) in SQL statements. Each serves a unique purpose, impacting query results and performance. Understanding these distinctions optimizes database queries for efficiency and accuracy in data aggregation tasks.

2.1. Sample Data

We’ll use the Department table from the University database for demonstration purposes. It displays eight rows of data, showcasing departments with IDs and names, such as Computer Science and Mechanical Engineering, along with entries containing empty or duplicate names and codes like CS and MA:

sql> select * FROM Department;
 id |              name              | code
----+--------------------------------+------
  1 | Computer Science               | CS
  2 | Electronics and Communications | EC
  3 | Mechanical Engineering         | ME
  4 | Civil Engineering              | CE
  5 | Mathematics                    | MA
  6 |                                |
  7 | Mathematics                    | MA2
  8 |                                |
(8 rows)

In this case, rows 6 and 8 contain NULL values, while rows 5 and 7 have duplicate names Mathematics.

2.2. Using COUNT(*) in SQL Queries

Here, the COUNT(*) counts all rows in a table, including those with NULL values.

sql> select COUNT(*) FROM Department;
 count
-------
     8
(1 row)

It is optimized in most databases because it does not require explicit NULL checks, ensuring efficient performance by straightforwardly tallying all rows.

2.3. Using COUNT(1) in SQL Queries

COUNT(1) counts the total number of rows in a table using the literal 1 instead of *. It yields the same result as COUNT(*).

sql> select COUNT(1) FROM Department;
 count
-------
     8
(1 row)

Performance-wise, it efficiently counts all rows but is less optimal compared to COUNT(*).

2.4. Using COUNT(column) in SQL Queries

COUNT(column) tallies non-NULL values in the specified column. Here, the id column has 8 non-NULL values.

sql> select COUNT(id) FROM Department;
 count
-------
     8
(1 row)

However, the name column has two NULL values out of 8, resulting in 6 as the count.

sql> select COUNT(name) FROM Department;
 count
-------
     6
(1 row)

It evaluates each row to discern if the column is NULL, potentially causing slightly slower performance compared to COUNT(*) or COUNT(1), due to the additional evaluation step per row.

2.5. Using COUNT(DISTINCT column) in SQL Queries

COUNT(DISTINCT column) calculates the count of unique, non-NULL values in the specified column. Similarly, the id column doesn’t have any duplicates or NULL values, resulting in a count of 8.

sql> select COUNT(DISTINCT id) FROM Department;
 count
-------
     8
(1 row)

On the other hand, the name column has two NULL values and one duplicate value, resulting in a count of 5.

sql> select COUNT(DISTINCT name) FROM Department;
 count
-------
     5
(1 row)

Performance-wise, it identifies distinct values in the column and tallies them, which can be more resource-intensive compared to other COUNT methods, particularly on large datasets.

3. Conclusion

In summary, understanding the distinctions between COUNT(*), COUNT(1), COUNT(column), and COUNT(DISTINCT) is essential for proficient SQL query optimization. COUNT(*) and COUNT(1) efficiently count all rows, while COUNT(column) and COUNT(DISTINCT column) provide nuanced capabilities for specific data aggregation needs.

Considerations such as performance optimization, handling NULL values, and managing duplicate data impact their suitability in different scenarios. This understanding empowers developers and analysts to make informed choices when manipulating data and optimizing SQL queries for diverse applications and datasets.