ClickHouse Performance Optimizations by Tencent#412
ClickHouse Performance Optimizations by Tencent#412rschu1ze merged 3 commits intoClickHouse:mainfrom
Conversation
|
This is excellent, thanks! This PR against the ClickBench repository is similar in spirit as @kitaisreal's Ursa (i.e. a research fork of ClickHouse). If all PRs are being integrated into the main codebase anyways, perhaps we don't need this PR (or can keep it open and continuosly update it for the time being)? |
|
Thanks for the feedback! I'd actually prefer to have this PR merged into the ClickBench repository for a few reasons:
Interesting — the string layout modification mentioned there is also implemented in ByConity (as BigString). I’ve encountered a similar need when working on the projection index feature (row-level index), where faster row seeking on string columns is critical. I’ll look into whether we can achieve this in a backward-compatible way. |
|
As long as the results are reproducible, let's merge. |
Yes, it is entirely possible.
|
Thanks, I wanted it for a long time! |
Hmm, I was actually thinking of a different strategy: keep using the same type, but recognize the underlying streams — and if there's a separate size stream, apply a new serde logic accordingly. This behavior would only apply to MergeTree's wide format, which I believe should be sufficient. |
|
Maybe we can try. Although, having to look up an additional file looks hacky. |
Sure, a different serde in |
I've just merged an additional optimization from my team that addresses the Q23 issue . With this fix, the results should now be fully reproducible without any manual post-processing. I've updated @rschu1ze @nickitat Could you help re-run the benchmarks and update the results on both Thanks a lot! |
|
I've implemented the separate size stream string format with Given this, it's not suitable to enable by default, so I’ll leave it disabled for now. P.S. The results have been updated. @alexey-milovidov — if everything looks good, would you mind helping to merge this PR? Much appreciated! |
|
@rschu1ze, thanks a lot! Just wondering — the entry appears on the leaderboard, but it only shows the name without rendering the actual benchmark results. Do you know what might be causing this? |
|
Is there a plan to introduce a PR for "Push TopN threshold to MergeTreeSource"? There are some issues (ClickHouse/ClickHouse#65990, ClickHouse/ClickHouse#85081, I reported ClickHouse/ClickHouse#75098) that I imagine would benefit significantly from that optimization, is that correct? |
There is definitely a plan, but since I already have several PRs pending, it will take some time to prepare and land it. |
|
@amosbird, the build file used in this entry has disappeared. Please update. |
From PR message:
@amosbird Simply rebase ClickHouse/ClickHouse#81944? |
|
Sure. I'll rebase it today. |

This submission builds on top of the latest ClickHouse with a series of performance optimizations, developed with support from Tencent. Each optimization has been carefully validated and is intended to be contributed upstream incrementally through individual PRs—some of which have already been merged.
Benchmark results were generated using artifacts built by the official CI pipeline of #81944, with great help from @nickitat — thank you!
The following optimizations are included:
1. Push TopN threshold to
MergeTreeSourcePushes the TopN threshold into
MergeTreeSourceto enable early filtering during the read phase. By passing the (N–1)th threshold value from the TopN state, rows below the threshold can be skipped earlier, reducing IO and improving performance.2. Precompute hashes and prefetch for prealloc variants (previous prealloc optimization)
For
ColumnsHashingimplementations that support the prealloc strategy:Also introduced the
optimize_trivial_group_by_limit_querysetting, which appliesmax_rows_to_group_byfor trivialGROUP BY LIMITqueries to avoid unnecessary aggregation work.3. Extend string hash map with inlined hash
The string hash map is optimized by combining string length and hash into a single 8-byte value. Since most string lengths and CRC32 hashes fit within 4 bytes, combining them:
4. Optimize index analysis with earlier QCC filtering (#82380)
Refactored the integration of Query Condition Cache (QCC) with index analysis:
This notably accelerates short queries when index analysis is the dominant cost.
5. Optimize single
COUNT()aggregation onNOT NULLcolumns (#82104)When an aggregation query only includes a single
COUNT()on aNOT NULLcolumn:This reduces memory usage and CPU overhead, significantly speeding up the aggregation.
6. Rewrite regular expression functions into simplified forms (#81992)
Primarily targets Q28. Introduced the
optimize_rewrite_regexp_functionssetting (enabled by default), allowing the optimizer to rewrite certain calls toreplaceRegexpAll,replaceRegexpOne, andextractinto simpler and faster forms when specific patterns are detected.Additionally:
count_distinct_optimizationby default with several related edge cases fixed.All these optimizations have been tested and validated via the ClickHouse CI pipeline. Although benchmarked on ClickBench, they were made possible thanks to the extensive support and real-world production environment provided by Tencent (TCHouse-C). I'm continuously working on additional improvements, and will persist in contributing until ClickHouse achieves top-tier performance on ClickBench once more :)