Skip to content

Fix logical error on intermediate prewhere expression reuse#90635

Merged
alexey-milovidov merged 8 commits intomasterfrom
intermediate_prewhere_exp_fix
Jan 17, 2026
Merged

Fix logical error on intermediate prewhere expression reuse#90635
alexey-milovidov merged 8 commits intomasterfrom
intermediate_prewhere_exp_fix

Conversation

@maxknv
Copy link
Copy Markdown
Member

@maxknv maxknv commented Nov 21, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixes an assertion when reading from Parquet file, and part of a prewhere expression is used elsewhere in the query.


Closes #90265
Closes #94289

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Nov 21, 2025

Workflow [PR], commit [1771c1f]

Summary:

job_name test_name status info comment
Bugfix validation (functional tests) failure
03727_prewhere_intermediate_columns FAIL cidb
Check errors failure
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: (STID: 1941-1bfa) FAIL cidb, issue

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Nov 21, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an assertion failure in prepareReadingFromFormat.cpp that occurred when intermediate prewhere expressions were preserved for reuse in other query parts (e.g., ORDER BY). The fix relaxes the overly restrictive assertion to correctly handle both prewhere final columns and intermediate expression columns based on the remove_prewhere_column flag.

Key changes:

  • Updated assertion logic in prepareReadingFromFormat.cpp to handle two valid scenarios: prewhere final column preservation (remove_prewhere_column=false) and intermediate column preservation (remove_prewhere_column=true)
  • Added explanatory comments documenting the two cases
  • Removed xfail markers from existing test, indicating the bug is now fixed

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/Storages/prepareReadingFromFormat.cpp Replaced two overly restrictive assertions with a single logical assertion that correctly handles both prewhere final columns and intermediate expression columns
ci/jobs/queries/00001_prewhere_intermediate_columns.sql Removed FIXME and xfail markers from test that now passes with the fix

@maxknv maxknv added pr-bugfix Pull request with bugfix, not backported by default pr-not-for-changelog This PR should not be mentioned in the changelog and removed pr-not-for-changelog This PR should not be mentioned in the changelog pr-bugfix Pull request with bugfix, not backported by default labels Nov 22, 2025
@alexey-milovidov alexey-milovidov self-assigned this Nov 22, 2025
@maxknv maxknv changed the title Intermediate prewhere expression fix Fix logical error on intermediate prewhere expression reuse Nov 23, 2025
@maxknv
Copy link
Copy Markdown
Member Author

maxknv commented Nov 23, 2025

Failure example: fuzzer report

@alexey-milovidov
Copy link
Copy Markdown
Member

  1. This should be included in the changelog.
  2. Please add a test.

@alexey-milovidov alexey-milovidov marked this pull request as draft November 23, 2025 23:46
@maxknv maxknv marked this pull request as ready for review November 24, 2025 03:08
@clickhouse-gh clickhouse-gh bot added pr-bugfix Pull request with bugfix, not backported by default and removed pr-not-for-changelog This PR should not be mentioned in the changelog labels Nov 24, 2025
@maxknv maxknv marked this pull request as draft November 24, 2025 03:27
@maxknv maxknv force-pushed the intermediate_prewhere_exp_fix branch from 75bcc49 to 61c622e Compare November 24, 2025 21:44
Copy link
Copy Markdown
Member

@al13n321 al13n321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough, need to make parquet reader recognize that columns are outputted by prewhere expression (at least in Reader::prefilterAndInitRowGroups, maybe other places), pass them through in Reader::applyPrewhere, update comment in ReadFromFormatInfo to say that requested_columns can contain prewhere expression outputs. And check that nothing else breaks because of it; I guess I'd start by searching for prewhere_column_name and see which of those code sites should use all prewhere expression outputs instead.

The test breaks with input_format_parquet_allow_missing_columns=0, and most likely isn't sorting correctly (uses 0 instead of x > 60, because parquet reader thinks "x > 60" is a column name, tries to read it from the file, and produces default values when the file doesn't have such column, because of input_format_parquet_allow_missing_columns, which is questionably enabled by default). (EDIT: Yep, this sorts incorrectly: SELECT DISTINCT x, _row_number FROM file('t.parquet') WHERE (x < 60) OR (x < 100) ORDER BY (x < 60), x)

I'll fix it.

@al13n321
Copy link
Copy Markdown
Member

Weird that this only happens in some complicated conditions (_row_number; optimize_move_to_prewhere instead of directly using PREWHERE) rather than all the time. Why doesn't query analysis always reuse subexperessions between PREWHERE and the rest of the query?

@al13n321 al13n321 self-requested a review November 26, 2025 05:30
@al13n321 al13n321 marked this pull request as ready for review November 26, 2025 05:30
@al13n321 al13n321 requested review from KochetovNicolai and removed request for al13n321 November 26, 2025 06:17
@al13n321
Copy link
Copy Markdown
Member

al13n321 commented Jan 1, 2026

I don't understand what Bugfix validation (functional tests) is trying to say. Locally, the new test passes with this PR and fails without it, as expected.

@alexey-milovidov
Copy link
Copy Markdown
Member

@al13n321 ok, let's ignore it. But please fix the conflict and check the failed tests for the userspace page cache.

@alexey-milovidov alexey-milovidov merged commit 076a49c into master Jan 17, 2026
128 of 133 checks passed
@alexey-milovidov alexey-milovidov deleted the intermediate_prewhere_exp_fix branch January 17, 2026 13:32
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

5 participants