Skip to content

[CORE] Fix non-deterministic filter executed twice when push down to scan#6296

Merged
zhztheplayer merged 3 commits intoapache:mainfrom
zml1206:fix_getRemainingFilters
Jul 2, 2024
Merged

[CORE] Fix non-deterministic filter executed twice when push down to scan#6296
zhztheplayer merged 3 commits intoapache:mainfrom
zml1206:fix_getRemainingFilters

Conversation

@zml1206
Copy link
Copy Markdown
Contributor

@zml1206 zml1206 commented Jul 1, 2024

What changes were proposed in this pull request?

PredicateHelper.getRemainingFilters has a bug, ExpressionSet can only remove deterministic expression, it will cause non-deterministic filter to be executed twice.

How was this patch tested?

UT.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 1, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 1, 2024

Run Gluten Clickhouse CI

Comment on lines +1896 to +1908
test("fix non-deterministic filter executed twice when push down to scan") {
val df = sql("select * from lineitem where rand() <= 0.5")
val plan = df.queryExecution.executedPlan
val scans = plan.collect { case scan: FileSourceScanExecTransformer => scan }
val filters = plan.collect { case filter: FilterExecTransformer => filter }
assert(scans.size == 1)
assert(filters.size == 1)
assert(scans(0).dataFilters.size == 1)
val remainingFilters = FilterHandler.getRemainingFilters(
scans(0).dataFilters,
splitConjunctivePredicates(filters(0).condition))
assert(remainingFilters.size == 0)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like the test only tests against method FilterHandler.getRemainingFilters. Do you think we should bring the result length check in #6191 back here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result length check in #6191 is not absolutely stable, so I used the method test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's stable as the probability of false positive is nearly zero

https://www.wolframalpha.com/input?i=P%5BX+%3E+25000%5D+for+X%7EB%2860000%2C0.25%29

You can use 25000 < x < 35000 which is very much enough

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 1, 2024

Run Gluten Clickhouse CI

1 similar comment
@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jul 2, 2024

Run Gluten Clickhouse CI

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jul 2, 2024

The error does not seem to be related to this PR. cc @zhztheplayer

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 2, 2024

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer merged commit 264ff2e into apache:main Jul 2, 2024
@zml1206 zml1206 deleted the fix_getRemainingFilters branch December 9, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants