Fix `LOGICAL_ERROR` for subquery projections with tuple subcolumns by yariks5s · Pull Request #101366 · ClickHouse/ClickHouse

yariks5s · 2026-03-31T11:09:27Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixes a rare bug in count distinct optimization. Closes #101324

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2026-03-31T11:10:09Z

Workflow [PR], commit [dec5114]

Summary: ❌

job_name	test_name	status	info
Build (arm_fuzzers)		failure
	Build ClickHouse	failure	cidb
Stateless tests (amd_msan, WasmEdge, parallel, 1/2)		failure
	03668_shard_join_in_reverse_order	FAIL	cidb, issue
Stress test (arm_msan)		failure
	MemorySanitizer: use-of-uninitialized-value (STID: 2410-543a)	FAIL	cidb

AI Review

Summary

This PR fixes a real LOGICAL_ERROR when count_distinct_optimization is used on tuple subcolumns in subqueries by avoiding ExpressionActions::getSmallestColumn, which skips subcolumns. However, the current fix changes the column-selection strategy for all query/union table expressions to always use the first projection column, which can regress count()-style plans by reading a much wider column than necessary. Verdict: request a small follow-up adjustment before merge.

Findings

⚠️ Majors
- [src/Planner/PlannerJoinTree.cpp:466] Replacing smallest-column selection with unconditional projection_columns.front() removes an existing optimization path for subquery/union table expressions. If the first projected column is wide (e.g. large String), trivial-count style plans may become more expensive than before.
- Suggested fix: keep smallest-column selection when possible (ExpressionActions::getSmallestColumn) and only fall back to projection_columns.front() for the all-subcolumns case that currently triggers No available columns.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

Performance & Safety

The fix is correctness-positive, but the new unconditional first-column selection can increase I/O and CPU for optimized aggregate plans over subqueries/unions when projection order puts a wide column first.

Final Verdict

Status: ⚠️ Request changes
Minimum required actions:
- Restore smallest-column selection for query/union projections and use first-column fallback only for subcolumn-only projections.

clickhouse-gh · 2026-03-31T11:13:01Z

src/Planner/PlannerJoinTree.cpp

+
+            /// pick the first projection column
+            /// ExpressionActions::getSmallestColumn is not used because it skips subcolumns (`tup.s` from Tuple)
+            additional_column_to_read = projection_columns.front();


This change fixes the LOGICAL_ERROR for tuple subcolumns, but it also drops the previous smallest-column selection for all query/union table expressions by always taking projection_columns.front().

That can regress SELECT count()-like plans when the first projected column is wide/expensive (e.g. large String) while a later projected column is much cheaper.

Could we keep the previous optimization by choosing the smallest projection column while still handling the all-subcolumns case? For example: try ExpressionActions::getSmallestColumn first, and if it throws No available columns, fall back to projection_columns.front().

IMHO the cost difference between passing up, for example, a UInt8 vs a String for a single discarded column is negligible.

clickhouse-gh · 2026-03-31T21:00:09Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.10%	84.20%	+0.10%
Functions	90.90%	90.90%	+0.00%
Branches	76.70%	76.80%	+0.10%

Changed lines: 88.89% (8/9) · Uncovered code

Full report · Diff report

Fix LOGICAL_ERROR for subquery projections with tuple subcolumns

dec5114

yariks5s added the 🍃 green ci 🌿 Fixing flaky tests in CI label Mar 31, 2026

clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 31, 2026

clickhouse-gh bot reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `LOGICAL_ERROR` for subquery projections with tuple subcolumns#101366

Fix `LOGICAL_ERROR` for subquery projections with tuple subcolumns#101366
yariks5s wants to merge 1 commit intomasterfrom
yarik/fix-logical-error-no-available-columns

yariks5s commented Mar 31, 2026 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

clickhouse-gh bot Mar 31, 2026

Uh oh!

yariks5s Mar 31, 2026

Uh oh!

clickhouse-gh bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yariks5s commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

ClickHouse Rules

Performance & Safety

Final Verdict

Uh oh!

clickhouse-gh bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

yariks5s Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yariks5s commented Mar 31, 2026 •

edited

Loading

clickhouse-gh bot commented Mar 31, 2026 •

edited

Loading