Skip to content

Enable use_top_k_dynamic_filtering and use_skip_indexes_for_top_k by default#99537

Open
alexey-milovidov wants to merge 15 commits intomasterfrom
enable-top-k-optimizations-by-default
Open

Enable use_top_k_dynamic_filtering and use_skip_indexes_for_top_k by default#99537
alexey-milovidov wants to merge 15 commits intomasterfrom
enable-top-k-optimizations-by-default

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Summary

  • Enable use_top_k_dynamic_filtering and use_skip_indexes_for_top_k settings by default
  • Add corresponding entries in settings changes history for version 26.3

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Enable use_top_k_dynamic_filtering and use_skip_indexes_for_top_k settings by default to improve performance of ORDER BY ... LIMIT N queries.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

No documentation changes needed — the settings and their documentation already exist; only the default values changed.

… by default

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 16, 2026

Workflow [PR], commit [3c7c91d]

Summary:

job_name test_name status info comment
Integration tests (amd_llvm_coverage, 2/5) failure
test_overcommit_tracker/test.py::test_user_overcommit FAIL cidb
Stress test (arm_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 1003-358c) FAIL cidb, issue
Integration tests (amd_asan_ubsan, db disk, old analyzer, 1/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 3/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 5/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 6/6) error
Integration tests (amd_binary, 3/5) error
Integration tests (amd_tsan, 1/6) error
Integration tests (amd_tsan, 3/6) error
Integration tests (amd_tsan, 5/6) error

AI Review

Summary

This PR flips defaults for use_top_k_dynamic_filtering and use_skip_indexes_for_top_k to true, records the default change in SettingsChangesHistory under 26.4, and stabilizes an existing stateless test by explicitly disabling both settings. I did not find new correctness, safety, compatibility, or performance-risk issues in the current PR head. PR metadata is consistent with the code change and the changelog entry is user-readable and specific.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Mar 16, 2026
@shankar-iyer shankar-iyer self-assigned this Mar 16, 2026
@shankar-iyer
Copy link
Copy Markdown
Member

I will check the failures.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

@shankar-iyer FYI, it is #91782

@shankar-iyer
Copy link
Copy Markdown
Member

@alexey-milovidov Let's enable only use_skip_indexes_for_top_k now? That aligns with our overall plan of skip indexes by default. use_top_k_dynamic_filtering is good when skip index is not present on the ORDER BY column, and provides only marginal improvement if skip index was present and used by use_skip_indexes_for_top_k.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

use_top_k_dynamic_filtering is good when skip index is not present

I see it as a solid use-case. And indices by default will not always be enabled; there will be plenty of use cases without them. Why to afraid?

@shankar-iyer
Copy link
Copy Markdown
Member

shankar-iyer commented Mar 18, 2026

Why to afraid?

@alexey-milovidov

Best case -

SELECT URL, EventTime FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=0

10 rows in set. Elapsed: 3.374 sec. Processed 100.00 million rows, 10.02 GB (29.64 million rows/s., 2.97 GB/s.)


SELECT URL, EventTime FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=1

10 rows in set. Elapsed: 0.526 sec. Processed 100.00 million rows, 1.20 GB (189.97 million rows/s., 2.28 GB/s.)

But this one -


SELECT URL, EventTime FROM hits WHERE URL LIKE '%shankar%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=0

0 rows in set. Elapsed: 3.154 sec. Processed 100.00 million rows, 9.90 GB (31.70 million rows/s., 3.14 GB/s.)


SELECT URL, EventTime FROM hits WHERE URL LIKE '%shankar%' ORDER BY EventTime DESC LIMIT 10
  SETTINGS use_top_k_dynamic_filtering=0

0 rows in set. Elapsed: 3.507 sec. Processed 100.00 million rows, 10.30 GB (28.52 million rows/s., 2.94 GB/s.)

If there is not many rows to sort after filters, we have additional overhead of reading a 4-byte/8-byte column (Numeric/Date). What do you say - enable it and real world workloads should balance out?

All the test failures are not due to any issue in the optimization, but due to change in rows_read profile event, which I can address.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

What do you say - enable it and real world workloads should balance out?

Yes, I think it will be beneficial on average!

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Working on that in #91782

Disable `use_top_k_dynamic_filtering` and `use_skip_indexes_for_top_k` in the
test that checks read_rows for ORDER BY LIMIT, since these optimizations change
the reading strategy and are not what this test validates.

Restore the accidentally removed empty 26.4 version placeholders in
SettingsChangesHistory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 26, 2026

CLA assistant check
All committers have signed the CLA.

@alexey-milovidov alexey-milovidov force-pushed the enable-top-k-optimizations-by-default branch from 1c18b6e to 7b20276 Compare March 26, 2026 23:47
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.70% +0.00%

Changed lines: 100.00% (21/21) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants