Fix LOGICAL_ERROR for subquery projections with tuple subcolumns#101366
Fix LOGICAL_ERROR for subquery projections with tuple subcolumns#101366
LOGICAL_ERROR for subquery projections with tuple subcolumns#101366Conversation
|
Workflow [PR], commit [dec5114] Summary: ❌
AI ReviewSummaryThis PR fixes a real Findings
ClickHouse Rules
Performance & SafetyThe fix is correctness-positive, but the new unconditional first-column selection can increase I/O and CPU for optimized aggregate plans over subqueries/unions when projection order puts a wide column first. Final Verdict
|
|
|
||
| /// pick the first projection column | ||
| /// ExpressionActions::getSmallestColumn is not used because it skips subcolumns (`tup.s` from Tuple) | ||
| additional_column_to_read = projection_columns.front(); |
There was a problem hiding this comment.
This change fixes the LOGICAL_ERROR for tuple subcolumns, but it also drops the previous smallest-column selection for all query/union table expressions by always taking projection_columns.front().
That can regress SELECT count()-like plans when the first projected column is wide/expensive (e.g. large String) while a later projected column is much cheaper.
Could we keep the previous optimization by choosing the smallest projection column while still handling the all-subcolumns case? For example: try ExpressionActions::getSmallestColumn first, and if it throws No available columns, fall back to projection_columns.front().
There was a problem hiding this comment.
IMHO the cost difference between passing up, for example, a UInt8 vs a String for a single discarded column is negligible.
LLVM Coverage Report
Changed lines: 88.89% (8/9) · Uncovered code |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixes a rare bug in count distinct optimization. Closes #101324
Documentation entry for user-facing changes