Skip to content

Fix LOGICAL_ERROR for subquery projections with tuple subcolumns#101366

Open
yariks5s wants to merge 1 commit intomasterfrom
yarik/fix-logical-error-no-available-columns
Open

Fix LOGICAL_ERROR for subquery projections with tuple subcolumns#101366
yariks5s wants to merge 1 commit intomasterfrom
yarik/fix-logical-error-no-available-columns

Conversation

@yariks5s
Copy link
Copy Markdown
Member

@yariks5s yariks5s commented Mar 31, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixes a rare bug in count distinct optimization. Closes #101324

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@yariks5s yariks5s added the 🍃 green ci 🌿 Fixing flaky tests in CI label Mar 31, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

Workflow [PR], commit [dec5114]

Summary:

job_name test_name status info comment
Build (arm_fuzzers) failure
Build ClickHouse failure cidb
Stateless tests (amd_msan, WasmEdge, parallel, 1/2) failure
03668_shard_join_in_reverse_order FAIL cidb, issue
Stress test (arm_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 2410-543a) FAIL cidb

AI Review

Summary

This PR fixes a real LOGICAL_ERROR when count_distinct_optimization is used on tuple subcolumns in subqueries by avoiding ExpressionActions::getSmallestColumn, which skips subcolumns. However, the current fix changes the column-selection strategy for all query/union table expressions to always use the first projection column, which can regress count()-style plans by reading a much wider column than necessary. Verdict: request a small follow-up adjustment before merge.

Findings
  • ⚠️ Majors
    • [src/Planner/PlannerJoinTree.cpp:466] Replacing smallest-column selection with unconditional projection_columns.front() removes an existing optimization path for subquery/union table expressions. If the first projected column is wide (e.g. large String), trivial-count style plans may become more expensive than before.
    • Suggested fix: keep smallest-column selection when possible (ExpressionActions::getSmallestColumn) and only fall back to projection_columns.front() for the all-subcolumns case that currently triggers No available columns.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Performance & Safety

The fix is correctness-positive, but the new unconditional first-column selection can increase I/O and CPU for optimized aggregate plans over subqueries/unions when projection order puts a wide column first.

Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Restore smallest-column selection for query/union projections and use first-column fallback only for subcolumn-only projections.

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 31, 2026

/// pick the first projection column
/// ExpressionActions::getSmallestColumn is not used because it skips subcolumns (`tup.s` from Tuple)
additional_column_to_read = projection_columns.front();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes the LOGICAL_ERROR for tuple subcolumns, but it also drops the previous smallest-column selection for all query/union table expressions by always taking projection_columns.front().

That can regress SELECT count()-like plans when the first projected column is wide/expensive (e.g. large String) while a later projected column is much cheaper.

Could we keep the previous optimization by choosing the smallest projection column while still handling the all-subcolumns case? For example: try ExpressionActions::getSmallestColumn first, and if it throws No available columns, fall back to projection_columns.front().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the cost difference between passing up, for example, a UInt8 vs a String for a single discarded column is negligible.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.20% +0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.80% +0.10%

Changed lines: 88.89% (8/9) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: No available columns (STID: 3938-33a6)

1 participant