Optimize ORDER BY...LIMIT N queries by skip index and dynamic threshold filter#89835
Conversation
|
Workflow [PR], commit [5f997d2] Summary: ❌
|
|
This sounds awesome, thank you! I wonder if there's a slightly slower, but more general, version of (2) that works with predicates, by roughly ranking the ranges to read by the For example, most tables are partitioned by time, and within a partition, recent parts are more likely to have recent data, so a query with Maybe it will cause more random I/O, but I guess you're overall saving I/O by skipping low-sorted granules, and with a LIMIT, you're able to force the dynamic TopN threshold to fill with the top values first, so you can skip more granules overall. Would this be possible? How does it sound? |
|
An even easier alternative to my proposal might be to do some internal "request chunking", like was recently introduced in ClickStack, where the query is broken up into ranges over the |
|
@EmeraldShift Thank you for the inputs and sorry for the late reply!
This sounds cool and if we do find a good threshold say within just reading/sorting 5%-10% of the data, that would really let the query skip reading vast number of granules. Let us try to firm up some logic into this PR - like you mentioned in your next comment. Maybe we do some specific logic for enqueue'ing read ranges if sorting is by
Let me check this. |
|
Some interesting runs with the optimizations - Select top 10 with URL like '%google%' - good impact Select top 1000 with URL like '%google%' - Not so good With Select top 1000 with URL like '%yandex%'.- this is where the optimizations make an impact! Select top 10000 with URL like '%yandex%'.- Good impact event with LIMIT 10000 Just to complete - Select top - 10 like '%yandex%' I am experimenting a bit more. |
|
Great results! 😁 Quick question: does this optimization support then this probably closes #75098! |
|
(and #65990) |
There was a problem hiding this comment.
Pull Request Overview
This PR implements optimizations for ORDER BY ... LIMIT N queries (top-N queries) by leveraging skip indexes and dynamic threshold filtering to reduce the number of rows processed. The implementation introduces three main optimization strategies: (1) using minmax skip indexes to pre-select qualifying granules when no predicates exist, (2) dynamic threshold filtering during query execution using minmax indexes, and (3) PREWHERE-based filtering using dynamically computed thresholds.
Key changes include:
- New
TopNThresholdTrackercomponent for thread-safe threshold value sharing across query execution components - Extension of minmax index functionality to support bulk granule operations and top-N filtering
- Query plan optimization to detect and configure top-N scenarios
- New internal function
__topNFilterfor PREWHERE-based filtering - Two new settings:
use_skip_indexes_for_top_nanduse_top_n_dynamic_filtering
Reviewed Changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Processors/TopNThresholdTracker.h | New thread-safe component for tracking and sharing top-N threshold values across optimization components |
| src/Functions/FunctionTopNFilter.cpp | New internal function for dynamic PREWHERE filtering based on evolving top-N thresholds |
| src/Storages/MergeTree/MergeTreeIndexMinMax.{h,cpp} | Extended to support bulk granule reading and top-N mark selection |
| src/Storages/MergeTree/MergeTreeIndexReadResultPool.{h,cpp} | Modified to support minmax index for top-N filtering and threshold tracking |
| src/Storages/MergeTree/MergeTreeReaderIndex.cpp | Enhanced mark skipping logic to use dynamic threshold filtering |
| src/Storages/MergeTree/MergeTreeDataSelectExecutor.{h,cpp} | Added top-N filtering using minmax indexes during part selection |
| src/Processors/Transforms/PartialSortingTransform.{h,cpp} | Integrated threshold tracker to publish top-N values during sorting |
| src/Processors/Transforms/MergeSortingTransform.{h,cpp} | Added threshold tracking and adaptive remerge logic for top-N optimization |
| src/Processors/QueryPlan/SortingStep.{h,cpp} | Plumbed threshold tracker through sorting pipeline |
| src/Processors/QueryPlan/ReadFromMergeTree.{h,cpp} | Added top-N filter info and skip index availability checking |
| src/Processors/QueryPlan/Optimizations/optimizeTopN.cpp | New optimization pass to detect and configure top-N query patterns |
| src/Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.{h,cpp} | Added settings for controlling top-N optimizations |
| src/Processors/QueryPlan/Optimizations/Optimizations.h | Registered new top-N optimization in the optimization list |
| src/Processors/QueryPlan/Optimizations/optimizeTree.cpp | Propagated new settings through optimization passes |
| src/Core/Settings.cpp | Added documentation for new top-N optimization settings |
| src/Core/SettingsChangesHistory.cpp | Recorded new settings in version history |
| src/Functions/CMakeLists.txt | Added FunctionTopNFilter to build |
| tests/queries/0_stateless/03711_top_n_by_skip_index.{sql,reference} | Test cases for top-N optimization with skip indexes |
| tests/performance/top_n_optimization_hits.xml | Performance benchmark for top-N optimization |
| && remerge_is_useful | ||
| && max_bytes_before_remerge | ||
| && sum_bytes_in_blocks > max_bytes_before_remerge) | ||
| && sum_bytes_in_blocks > max_bytes_before_remerge) || (threshold_tracker && (sum_rows_in_blocks > limit*1.5))) |
There was a problem hiding this comment.
Magic number: The hardcoded value 1.5 should be extracted as a named constant (e.g., threshold_tracker_remerge_factor) to improve code readability and maintainability.
I need to check implementation of |
|
I'm curious about what makes |
|
Another random thought for the For example, three threads solving Given this data, we know there are at least 300 rows with N >= 123, and 250 + 300 = 550 rows with N >= 120, so if the query had e.g. In general, I suppose we can scan the threads in threshold order (in this case, following |
|
Some more results from further testing:
|
@EmeraldShift By any chance, were you testing without commit - 9cd57bf ? Is skip index defined on the |
|
The testing is on commit Here's the table schema:
|
|
I'll re-test my query on the granularity = 1 commit later today |
|
I can no longer reproduce the segmentation fault on the latest commit 😁 |
Thanks! |
|
Would it be possible to also leverage the primary ORDER BY key if the data is [partially] sorted by the query's ORDER BY? Some random thoughts:
This feature looks great as-is, and is already a huge improvement! I'm mostly asking if this sounds like reasonable follow-on work to continue optimizing this shape of query, which is extremely common in my logging use case! 😁 |
01e3689
| UInt64 limit_, | ||
| bool skip_partial_sort = false); | ||
| bool skip_partial_sort = false, | ||
| TopKThresholdTrackerPtr threshold_tracker = nullptr); |
There was a problem hiding this comment.
@shankar-iyer passing TopKThresholdTrackerPtr breaks the ability to serialize/deserialize SortingStep and makes this feature incompatible with distributed queries. Plan step must be a "description" of what is done that can be easily passed over network to a different node and only initializePipeline/updatePipeline methods should convert this description into runtime structures that implement the actual processing logic.
Maybe you can use similar approach to what is done for join runtime filters - they are "registgered" in query context under unique names and looked up by his name at runtime: https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/Context.h#L283
There was a problem hiding this comment.
@davenger Thanks for the review and spotting this! Let me check and I will update accordingly.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Optimize
ORDER BY...LIMIT Nqueries by using skip index and dynamic threshold filter to significantly reduce rows processed.Details
Implements optimizations for
top Nquery processing. All have a common theme -minmaxindex exists on the ORDER BY column, use theminmaxindex to shortlist top qualifying granules e.g only 400K rows processed out of 100M below -SELECT * FROM logs WHERE hasToken(message, 'OOM') ORDER BY timestamp LIMIT 10. This item needs some work and discussions.dynamictop-N threshold filtering using theminmaxskip index. As the query executes and top-N values keep evolving, the currentNth value (or threshold) can be used to lookup theminmaxindex and totally skip granules that cannot enhance the top-N resultset. e.gminmaxskip index is not present, still use the top-N threshold to execute aPREWHEREon the sorting column. This optimization will be useful if there are other costly predicates in the query and thus evaluating the dynamic PREWHERE and rejecting rows could turn out cheaper. This was initially proposed in WIP some perf optimizations #81944I still need to fix some issues, high level and low level feedback welcome!