-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
DataFusion gets different answers when parquet pushdown is enabled
NOTE that pushdown filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:
To Reproduce
- Download data from
repro.zip - Run datafusion CLI
The query run is
select count(*) from foo where container = 'backend_container_0' OR pod = 'aqcathnxqsphdhgjtgvxsfyiwbmhlmg';Expected behavior
Same answer should be produced with and without page index filtering enabled. However, the answers are different
Without filter pushdown 39982 rows are produced
$ DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=false datafusion-cli -f script.sql
...
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 39982 |
+-----------------+With it enabled:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 0 |
+-----------------+
1 row in set. Query took 0.004 seconds.Additional context
Found by the test here #3976
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working