Fix logical error on intermediate prewhere expression reuse#90635
Fix logical error on intermediate prewhere expression reuse#90635alexey-milovidov merged 8 commits intomasterfrom
Conversation
|
Workflow [PR], commit [1771c1f] Summary: ❌
|
82ddf39 to
6ae24ab
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes an assertion failure in prepareReadingFromFormat.cpp that occurred when intermediate prewhere expressions were preserved for reuse in other query parts (e.g., ORDER BY). The fix relaxes the overly restrictive assertion to correctly handle both prewhere final columns and intermediate expression columns based on the remove_prewhere_column flag.
Key changes:
- Updated assertion logic in
prepareReadingFromFormat.cppto handle two valid scenarios: prewhere final column preservation (remove_prewhere_column=false) and intermediate column preservation (remove_prewhere_column=true) - Added explanatory comments documenting the two cases
- Removed xfail markers from existing test, indicating the bug is now fixed
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Storages/prepareReadingFromFormat.cpp | Replaced two overly restrictive assertions with a single logical assertion that correctly handles both prewhere final columns and intermediate expression columns |
| ci/jobs/queries/00001_prewhere_intermediate_columns.sql | Removed FIXME and xfail markers from test that now passes with the fix |
|
Failure example: fuzzer report |
|
75bcc49 to
61c622e
Compare
There was a problem hiding this comment.
This is not enough, need to make parquet reader recognize that columns are outputted by prewhere expression (at least in Reader::prefilterAndInitRowGroups, maybe other places), pass them through in Reader::applyPrewhere, update comment in ReadFromFormatInfo to say that requested_columns can contain prewhere expression outputs. And check that nothing else breaks because of it; I guess I'd start by searching for prewhere_column_name and see which of those code sites should use all prewhere expression outputs instead.
The test breaks with input_format_parquet_allow_missing_columns=0, and most likely isn't sorting correctly (uses 0 instead of x > 60, because parquet reader thinks "x > 60" is a column name, tries to read it from the file, and produces default values when the file doesn't have such column, because of input_format_parquet_allow_missing_columns, which is questionably enabled by default). (EDIT: Yep, this sorts incorrectly: SELECT DISTINCT x, _row_number FROM file('t.parquet') WHERE (x < 60) OR (x < 100) ORDER BY (x < 60), x)
I'll fix it.
|
Weird that this only happens in some complicated conditions ( |
|
I don't understand what |
|
@al13n321 ok, let's ignore it. But please fix the conflict and check the failed tests for the userspace page cache. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixes an assertion when reading from Parquet file, and part of a prewhere expression is used elsewhere in the query.
Closes #90265
Closes #94289