Skip to content

Regression: group by count distinct doesn't work for timestamps with time zone #10042

@joroKr21

Description

@joroKr21

Describe the bug

I didn't check in which version it regressed, but it worked in v34 and is broken in v37.
It looks like the regression was caused by the specialization of the DistinctCount implementation of AggregateExpr.

To Reproduce

create table t_source
as values
 ('2018-11-13T17:11:10.011375885995', 'Row 0', 'X'),
 ('2011-12-13T11:13:10.12345',        'Row 1', 'X'),
 (null,                               'Row 2', 'Y'),
 ('2021-01-01T05:11:10.432',          'Row 3', 'Y');

create table t as
select
  arrow_cast(column1, 'Timestamp(Second, Some("UTC"))') as secs_utc,
  column2 as names,
  column3 as tag
from t_source;

SELECT tag, count(DISTINCT secs_utc) FROM t GROUP BY tag ORDER BY tag;

Expected behavior

It should work. The error we get is:

External error: query failed: DataFusion error: External error: External error: Arrow error: Invalid argument error: column types must match schema types, expected List(Field { name: "item", data_type: Timestamp(Second, Some("UTC")), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", data_type: Timestamp(Second, None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 1

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionSomething that used to work no longer does

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions