Skip to content

Incorrect results when using grouping sets with data containing nulls #12570

@eejbyfeldt

Description

@eejbyfeldt

Describe the bug

When doing a grouping set on columns with null values we produce the incorrect result.

To Reproduce

Run the following in query in datafusion-cli

> CREATE TABLE integers_with_nulls (value INT) as VALUES (1), (NULL);

0 row(s) fetched. 
Elapsed 0.016 seconds.

> SELECT value, min(value) FROM integers_with_nulls GROUP BY CUBE(value);

+-------+--------------------------------+
| value | min(integers_with_nulls.value) |
+-------+--------------------------------+
| 1     | 1                              |
|       | 1                              |
+-------+--------------------------------+

Expected behavior

The expected bevahior is that the null in the data creates a different group from the nulls from the grouping set. The exected results are

> SELECT value, min(value) FROM integers_with_nulls GROUP BY CUBE(value);
+-------+--------------------------------+
| value | min(integers_with_nulls.value) |
+-------+--------------------------------+
| 1     | 1                              |
|       | 1                              |
|       |                                |
+-------+--------------------------------+

This is the behavior in PostgresSQL, Spark and DuckDB (and most likely other query engines).

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions