[CORE] Fix non-deterministic filter executed twice when push down to scan#6296
[CORE] Fix non-deterministic filter executed twice when push down to scan#6296zhztheplayer merged 3 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
Run Gluten Clickhouse CI |
| test("fix non-deterministic filter executed twice when push down to scan") { | ||
| val df = sql("select * from lineitem where rand() <= 0.5") | ||
| val plan = df.queryExecution.executedPlan | ||
| val scans = plan.collect { case scan: FileSourceScanExecTransformer => scan } | ||
| val filters = plan.collect { case filter: FilterExecTransformer => filter } | ||
| assert(scans.size == 1) | ||
| assert(filters.size == 1) | ||
| assert(scans(0).dataFilters.size == 1) | ||
| val remainingFilters = FilterHandler.getRemainingFilters( | ||
| scans(0).dataFilters, | ||
| splitConjunctivePredicates(filters(0).condition)) | ||
| assert(remainingFilters.size == 0) | ||
| } |
There was a problem hiding this comment.
It's like the test only tests against method FilterHandler.getRemainingFilters. Do you think we should bring the result length check in #6191 back here?
There was a problem hiding this comment.
Result length check in #6191 is not absolutely stable, so I used the method test.
There was a problem hiding this comment.
It's stable as the probability of false positive is nearly zero
https://www.wolframalpha.com/input?i=P%5BX+%3E+25000%5D+for+X%7EB%2860000%2C0.25%29
You can use 25000 < x < 35000 which is very much enough
|
Run Gluten Clickhouse CI |
1 similar comment
|
Run Gluten Clickhouse CI |
|
The error does not seem to be related to this PR. cc @zhztheplayer |
|
Run Gluten Clickhouse CI |
What changes were proposed in this pull request?
PredicateHelper.getRemainingFiltershas a bug,ExpressionSetcan only remove deterministic expression, it will cause non-deterministic filter to be executed twice.How was this patch tested?
UT.