Skip to content

[VL] Native Scan operator incurred inaccuracy in some filter cases. #8764

@zhixingheyi-tian

Description

@zhixingheyi-tian

Backend

VL (Velox)

Bug description

CREATE TABLE default.testtable (
  a                    bigint,
  b               bigint,
  c        int
)
USING parquet
LOCATION 'file:///data/dataset/testtable';

dataset:

dataset.txt

query:

  select
    max(a) as id
  from
    testtable
  where
    c = 2  
  group by
    b;

[Expected behavior]

spark-sql (default)>   select
                   >     max(a) as id
                   >   from
                   >     testtable
                   >   where
                   >     c = 2  
                   >   group by
                   >     b;
299
275
233
237
323
321
253
102
226
219
297
319
327
249
96
Time taken: 4.221 seconds, Fetched 15 row(s)

[Actual behavior]

spark-sql (default)>   select
                   >     max(a) as id
                   >   from
                   >     testtable
                   >   where
                   >     c = 2  
                   >   group by
                   >     b;
139938755249256
139938755249256
139936948064087
139945378802512
139945378802512
139938755303160
139938755303160
139945378802512
139945378802512
139938755249256
139945378802512
139945378802512
139945378802512
139938755249256
139938755303160
Time taken: 4.244 seconds, Fetched 15 row(s)

This error can be reproduced consistently.
But if Velox is compiled in DEBUG mode, the result is also accurate.

cc @weiting-chen @zhouyuan , Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions