Support for multiple columns and index granularity > 1 in top-K threshold filtering optimization by shankar-iyer · Pull Request #93545 · ClickHouse/ClickHouse

shankar-iyer · 2026-01-07T07:16:11Z

See #91645

Details

The support for index granularity > 1 : If the min-max index granule is [50, 88] and a single index granule covers 8 table granules, then use [50, 88] as the min-max value for all the 8 table granules. When shortlisting table granules for LIMIT <n>, we need to shortlist the top 8 x n table granules and also do handle_ties processing (next point)
To support ORDER BY a,b - We will use the value of a for threshold filtering. And the key change is how we compare to threshold - we need to send all rows with a value >= the threshold (for ORDER BY a DESC, b). Note the '=', that ensures that optimization is correct.

Changelog category (leave one):

Improvement

Changelog entry (a [user-readable short description](

The top-K threshold filtering optimization introduced in 25.12 now supports 1) Multiple columns inORDER BY clause. the first column's value is used to determine the threshold. 2) minmax index with index granularity > 1

Note

Medium Risk
Changes core MergeTree top-K pruning logic and threshold comparisons, which can affect query correctness (ties) and performance (more granules selected) across ORDER BY/LIMIT queries.

Overview
Top-K ORDER BY … LIMIT optimization is broadened and made tie-safe. The optimizer now allows multiple ORDER BY columns (using the first column for pruning) and passes num_sort_columns through TopKFilterInfo so the executor can decide when it must handle ties.

Minmax-based top-K selection now works with skip-index granularity > 1 by expanding index granules to underlying table marks and, when required, selecting k * granularity marks plus all marks tied at the threshold. Threshold filtering semantics were tightened to be inclusive (<=/>=) via __topKFilter and strictness changes in TopKThresholdTracker, and the optimization is disabled for LIMIT 0.

Adds stateless tests covering multi-column ordering, coarser index granularity, and negative cases where the optimization must not apply.

^{Written by Cursor Bugbot for commit 1fcb0d9. This will update automatically on new commits. Configure here.}

clickhouse-gh · 2026-01-07T07:16:40Z

Workflow [PR], commit [1fcb0d9]

Summary: ❌

job_name	test_name	status	info	comment
BuzzHouse (amd_msan)		failure
	Segmentation fault (STID: 6336-79b2)	FAIL	cidb, issue	ISSUE CREATED

shankar-iyer · 2026-01-12T04:36:22Z

Clean CI run at 0f3b4a5.

Copilot

Pull request overview

This PR extends the top-K threshold filtering optimization to support two new scenarios: 1) multiple columns in the ORDER BY clause (using the first column's value for threshold determination), and 2) min-max indexes with granularity > 1 (by replicating index values across corresponding table granules and handling ties appropriately).

Changes:

Extended TopKFilterInfo to track the number of sort columns
Modified index granule handling to support index granularity > 1 by replicating min-max values across table granules
Added tie-handling logic to ensure correct results when using the first column of multi-column sorts or granularity > 1
Updated threshold comparison operators from >=/<= to >/< to properly handle tie scenarios

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/Processors/TopKThresholdTracker.h`	Changed threshold comparison operators to strict inequalities for tie handling
`src/Processors/QueryPlan/ReadFromMergeTree.h`	Added `num_sort_columns` field to `TopKFilterInfo`
`src/Processors/QueryPlan/ReadFromMergeTree.cpp`	Removed granularity==1 requirement and enhanced logging
`src/Processors/QueryPlan/Optimizations/optimizeTopK.cpp`	Removed single-column restriction, added column validation, and passed `num_sort_columns`
`src/Storages/MergeTree/MergeTreeIndexMinMax.h`	Added parameters for index granularity and tie handling
`src/Storages/MergeTree/MergeTreeIndexMinMax.cpp`	Implemented granule replication and tie-handling logic with template specialization
`src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp`	Integrated new parameters and tie-handling flag
`tests/queries/0_stateless/03711_top_k_by_skip_index_negative.sql`	Test cases verifying optimization doesn't apply to invalid scenarios
`tests/queries/0_stateless/03711_top_k_by_skip_index_negative.reference`	Expected output for negative test cases
`tests/queries/0_stateless/03711_top_k_by_skip_index_handle_ties.sql`	Test cases for multi-column ORDER BY and granularity > 1
`tests/queries/0_stateless/03711_top_k_by_skip_index_handle_ties.reference`	Expected output for tie-handling test cases

Copilot · 2026-01-16T05:58:17Z

src/Storages/MergeTree/MergeTreeIndexMinMax.cpp

-    if (store_map)
-        granules_map.emplace(granule_num, granules.size() - 1);
+    /// If index granularity is not 1, we insert the same value as the min
+    /// or max for all the corresponding granules. For our top-K purpose,this


Missing space after comma in 'purpose,this'.

Suggested change

/// or max for all the corresponding granules. For our top-K purpose,this

/// or max for all the corresponding granules. For our top-K purpose, this

Copilot · 2026-01-16T05:58:17Z

src/Processors/QueryPlan/Optimizations/optimizeTopK.cpp

        else
        {
-            LOG_TRACE(getLogger("optimizeTopK"), "Could not resolve column alias {}", sort_column_name);
+            LOG_DEBUG(getLogger("optimizeTopK"), "Could not resolve column alias {} {}", sort_column_name, column_node->type);


The error message includes column_node->type which may not be a string. This could produce unclear output if the type doesn't have a good string representation. Consider using a more descriptive format or converting the type to a readable string.

Suggested change

LOG_DEBUG(getLogger("optimizeTopK"), "Could not resolve column alias {} {}", sort_column_name, column_node->type);

LOG_DEBUG(getLogger("optimizeTopK"), "Could not resolve column alias {} {}", sort_column_name, static_cast<int>(column_node->type));

Poco logger handles enum

Copilot · 2026-01-16T05:58:17Z

src/Storages/MergeTree/MergeTreeIndexMinMax.cpp

+        auto min_granules = n * index_granularity;
+        auto threshold = queue.top();
+        for (size_t i = 0; i < min_granules && !queue.empty(); ++i)


The variable name min_granules is ambiguous. It represents the minimum number of granules to select before tie-handling, not a minimum granule identifier. Consider renaming to min_granules_to_select or target_granule_count for clarity.

Suggested change

auto min_granules = n * index_granularity;

auto threshold = queue.top();

for (size_t i = 0; i < min_granules && !queue.empty(); ++i)

auto target_granule_count = n * index_granularity;

auto threshold = queue.top();

for (size_t i = 0; i < target_granule_count && !queue.empty(); ++i)

good point! done.

Copilot · 2026-01-16T05:58:18Z

src/Storages/MergeTree/MergeTreeIndexMinMax.cpp

+        auto min_granules = n * index_granularity;
+        auto threshold = queue.top();
+        for (size_t i = 0; i < min_granules && !queue.empty(); ++i)


The variable name min_granules is ambiguous. It represents the minimum number of granules to select before tie-handling, not a minimum granule identifier. Consider renaming to min_granules_to_select or target_granule_count for clarity.

Suggested change

auto min_granules = n * index_granularity;

auto threshold = queue.top();

for (size_t i = 0; i < min_granules && !queue.empty(); ++i)

auto min_granules_to_select = n * index_granularity;

auto threshold = queue.top();

for (size_t i = 0; i < min_granules_to_select && !queue.empty(); ++i)

shankar-iyer · 2026-01-21T12:53:51Z

Full green ci at c000dbb, except for failure of 03711_top_k_by_skip_index_negative.sql due to tag mistake.

src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

src/Storages/MergeTree/MergeTreeIndexMinMax.cpp

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

src/Processors/QueryPlan/Optimizations/optimizeTopK.cpp

clickhouse-gh · 2026-03-09T06:08:50Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	83.80%	83.80%	+0.00%
Functions	23.90%	23.90%	+0.00%
Branches	76.40%	76.30%	-0.10%

PR changed lines: PR changed-lines coverage: 72.22% (156/216)
Diff coverage report
Uncovered code

shankar-iyer · 2026-03-09T07:51:30Z

Failure in BuzzHouse (amd_msan) unrelated to PR, I will check and fix.

…i-column ORDER BY with dynamic filtering in top-K tests PR ClickHouse#93545 added support for multiple columns and index granularity > 1 in top-K threshold filtering, but the existing tests had two gaps: 1. `03711_top_k_by_skip_index_handle_ties`: only tested multi-column `ORDER BY` with `GRANULARITY 4`. Added single-column `ORDER BY v1` queries using the same table to cover the single-column + `GRANULARITY > 1` path. 2. `03711_top_k_by_dynamic_filter`: only tested single-column `ORDER BY` with `use_top_k_dynamic_filtering=1`. Added a second table `tab2` with tied `v1` values (10 distinct values) to cover multi-column `ORDER BY` with the dynamic filtering path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove `skip_index->index.granularity != 1` guard: skip indexes with granularity > 1 are now supported (ref ClickHouse#93545). - Trim test tags to just `no-fasttest`; 10000 rows is fine for sanitizers. Made-with: Cursor

shankar-iyer added 7 commits December 17, 2025 15:35

Support for index granularity > 1 and multi-column

5e2c907

stash

1f9d9ba

stash

90fa503

Merge branch 'master' into top_k_fixes

4ac4856

Working good

bb6aed1

Merge branch 'master' into top_k_fixes

2e94c95

Tests added

ff69fa1

shankar-iyer marked this pull request as draft January 7, 2026 07:16

clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Jan 7, 2026

shankar-iyer added 2 commits January 11, 2026 15:04

Merge branch 'master' into top_k_fixes

5cd7f0c

style

0feb4a5

shankar-iyer marked this pull request as ready for review January 12, 2026 04:36

fix crash issue ClickHouse#93893

7a81f6f

shankar-iyer requested a review from Copilot January 16, 2026 05:57

Copilot AI reviewed Jan 16, 2026

View reviewed changes

shankar-iyer added 3 commits January 21, 2026 08:17

Merge branch 'master' into top_k_fixes

4069b3e

copilot review

c000dbb

correct the tag

f060367

rschu1ze self-assigned this Jan 21, 2026

rschu1ze approved these changes Mar 8, 2026

View reviewed changes

Merge branch 'master' into top_k_fixes

3671b41

cursor bot reviewed Mar 9, 2026

View reviewed changes

src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp Outdated Show resolved Hide resolved

Fix after merge

53a813d

cursor bot reviewed Mar 9, 2026

View reviewed changes

src/Storages/MergeTree/MergeTreeIndexMinMax.cpp Show resolved Hide resolved

src/Storages/MergeTree/MergeTreeIndexMinMax.cpp Show resolved Hide resolved

Cursor review

e2bacb3

cursor bot reviewed Mar 9, 2026

View reviewed changes

src/Processors/QueryPlan/Optimizations/optimizeTopK.cpp Show resolved Hide resolved

Another Cursor comment

1fcb0d9

shankar-iyer added this pull request to the merge queue Mar 9, 2026

Merged via the queue into ClickHouse:master with commit 728da43 Mar 9, 2026
161 of 163 checks passed

shankar-iyer deleted the top_k_fixes branch March 9, 2026 08:33

robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 9, 2026

zlareb1 mentioned this pull request Mar 16, 2026

Add coverage for single-column ORDER BY with GRANULARITY > 1 and multi-column ORDER BY with dynamic filtering in top-K tests #99571

Merged

1 task

shankar-iyer mentioned this pull request Mar 18, 2026

Support non-numeric & nullable types for FunctionTopKFilter #99033

Merged

1 task

	/// or max for all the corresponding granules. For our top-K purpose,this
	/// or max for all the corresponding granules. For our top-K purpose, this

	LOG_DEBUG(getLogger("optimizeTopK"), "Could not resolve column alias {} {}", sort_column_name, column_node->type);
	LOG_DEBUG(getLogger("optimizeTopK"), "Could not resolve column alias {} {}", sort_column_name, static_cast<int>(column_node->type));

Conversation

shankar-iyer commented Jan 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a [user-readable short description](

Uh oh!

clickhouse-gh bot commented Jan 7, 2026 • edited by shankar-iyer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shankar-iyer commented Jan 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

shankar-iyer Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

shankar-iyer Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

shankar-iyer Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

shankar-iyer Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

shankar-iyer commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clickhouse-gh bot commented Mar 9, 2026

LLVM Coverage Report

Uh oh!

shankar-iyer commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shankar-iyer commented Jan 7, 2026 •

edited by cursor bot

Loading

clickhouse-gh bot commented Jan 7, 2026 •

edited by shankar-iyer

Loading