Fix incorrect ORDER BY elision when read-in-order-through-join meets shard join by tuanpach · Pull Request #101456 · ClickHouse/ClickHouse

tuanpach · 2026-04-01T06:26:35Z

With query_plan_read_in_order_through_join and query_plan_join_shard_by_pk_ranges
both enabled, buildInputOrderInfo traverses through the JOIN, finds the MergeTree
primary key, and demotes the outer ORDER BY from FullSortingStep to
FinishSortingStep, trusting the storage to emit rows in PK order end-to-end.
optimizeJoinByShards then splits the read into independent PK-range layers. Each
layer emits rows in storage order independently and outputs are concatenated without a
cross-layer sort, producing wrong results (e.g. 4,3,2,1,0,9,8,7,6,5 instead of
0..9 on a DESC-key table with max_threads = 2).

Three changes fix this:

Move optimizeJoinByShards to run before optimizeReadInOrder in
optimizeTreeSecondPass. Previously it ran near applyOrder (line ~499), after
optimizeReadInOrder had already executed inside traverseQueryPlan (line ~287),
so the sharding state was never visible when buildInputOrderInfo made its decision.
Add a guard in all three buildInputOrderInfo overloads (SortingStep,
AggregatingStep, DistinctStep): if any JoinStep on the path to the scan has
isJoinByLayersEnabled(), return early to preserve the outer ORDER BY as
FullSortingStep.
Gate enableJoinByLayers on all_split.layers.size() > 1. With a single layer
there is no stream interleaving and global PK ordering is preserved, so suppressing
read-in-order is unnecessary and forces a full sort.

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed wrong query results when query_plan_read_in_order_through_join and
query_plan_join_shard_by_pk_ranges are both enabled on a reverse-key (DESC)
MergeTree table.

Closes #100870

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2026-04-01T06:27:17Z

Workflow [PR], commit [26c8d0a]

AI Review

Summary

This PR fixes a real correctness issue where read-in-order-through-join could incorrectly elide the final sort after join-by-pk-ranges sharding, yielding misordered results on reverse-key MergeTree. The fix is coherent: it reorders optimization passes, guards all relevant buildInputOrderInfo paths (SortingStep, AggregatingStep, DistinctStep), and only marks joins as layered when multiple layers actually exist. The regression test covers the failing path and the additional guarded paths. High-level verdict: ✅ good and ready.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

Final Verdict

Status: ✅ Approve

src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp

clickhouse-gh · 2026-04-01T10:58:52Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.00%	84.00%	+0.00%
Functions	90.90%	90.90%	+0.00%
Branches	76.50%	76.50%	+0.00%

Changed lines: 100.00% (20/20) · Uncovered code

Full report · Diff report

src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp

…ough a sharded JOIN With `query_plan_read_in_order_through_join` and `query_plan_join_shard_by_pk_ranges` both enabled, `buildInputOrderInfo` traverses through the JOIN, finds the MergeTree primary key, and demotes the outer ORDER BY from `FullSortingStep` to `FinishSortingStep`, trusting the storage to emit rows in PK order end-to-end. `optimizeJoinByShards` then splits the read into independent PK-range layers. Each layer emits rows in storage order independently and outputs are concatenated without a cross-layer sort, producing wrong results (e.g. `4,3,2,1,0,9,8,7,6,5` instead of `0..9` on a `DESC`-key table with `max_threads = 2`). Three changes fix this: 1. Move `optimizeJoinByShards` to run before `optimizeReadInOrder` in `optimizeTreeSecondPass`. Previously it ran near `applyOrder` (line ~499), after `optimizeReadInOrder` had already executed inside `traverseQueryPlan` (line ~287), so the sharding state was never visible when `buildInputOrderInfo` made its decision. 2. Add a guard in all three `buildInputOrderInfo` overloads (`SortingStep`, `AggregatingStep`, `DistinctStep`): loop over `find_reading_ctx.joins_to_keep_in_order` and return early if any `JoinStep` on the path has `isJoinByLayersEnabled()`, preserving the outer ORDER BY as `FullSortingStep`. 3. Gate `enableJoinByLayers` — and the companion `convertToPartitionedFinishSorting` and `keepLeftPipelineInOrder` calls — on `all_split.layers.size() > 1`. With a single layer there is no stream interleaving and global PK ordering is preserved, so suppressing read-in-order is unnecessary and forces a full sort. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

clickhouse-gh bot added the pr-ci label Apr 1, 2026

clickhouse-gh bot reviewed Apr 1, 2026

View reviewed changes

src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp Outdated Show resolved Hide resolved

clickhouse-gh bot reviewed Apr 1, 2026

View reviewed changes

src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp Outdated Show resolved Hide resolved

antaljanosbenjamin mentioned this pull request Apr 1, 2026

Auto spilling join #97813

Open

1 task

clickhouse-gh bot reviewed Apr 1, 2026

View reviewed changes

src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp Show resolved Hide resolved

tuanpach force-pushed the fix-join-shard-read-in-order-reverse-key branch from de252c9 to 26c8d0a Compare April 2, 2026 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect ORDER BY elision when read-in-order-through-join meets shard join#101456

Fix incorrect ORDER BY elision when read-in-order-through-join meets shard join#101456
tuanpach wants to merge 1 commit intoClickHouse:masterfrom
tuanpach:fix-join-shard-read-in-order-reverse-key

tuanpach commented Apr 1, 2026 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh bot commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tuanpach commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

ClickHouse Rules

Final Verdict

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh bot commented Apr 1, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tuanpach commented Apr 1, 2026 •

edited

Loading

clickhouse-gh bot commented Apr 1, 2026 •

edited

Loading