Fix skip indexes on data read with patch parts#99543
Fix skip indexes on data read with patch parts#99543alexey-milovidov wants to merge 8 commits intomasterfrom
Conversation
Continuation of #88702 - rebased on current master and fixed tests. The changes: - Extract `canUseIndex`/`canUseMergedIndex` as static methods on `MergeTreeDataSelectExecutor` so they can be reused in multiple places. - Add `canUseIndex` checks in `MergeTreeSkipIndexReader::read` to skip indexes that depend on columns being updated on the fly by lightweight updates. - Add `canUseIndex` check in `collectTextIndexReadInfos` to prevent direct reading from text index when the indexed column is updated. - Fix `MergeTreeReadersChain` to only read and apply patches at the step that actually has columns needing patches, not always at step 0. - Fix `addPatchPartsColumns` to find the correct first step with patches instead of assuming it is always the first prewhere step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [2e72237] Summary: ❌
AI ReviewSummaryThis PR fixes skip-index usage with lightweight updates and patch parts by adding Findings
Tests
ClickHouse Rules
Final Verdict
|
…tion The previous commit tried to defer `readPatches` to the step that has patched columns. This caused "Cannot clone block with columns because block has N columns, but M columns given" when the patches were at step i > 0: `readPatches` was called with the full `getSampleBlock` header (all accumulated columns) but `read_result.columns` only had columns from earlier steps. Revert `MergeTreeReadersChain`, `MergeTreeBlockReadUtils`, and `MergeTreeReadersChain.h` to their pre-patch state. The skip index filtering logic (canUseIndex, canUseMergedIndex, text index checks) is preserved — those changes are independent and correct. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99543&sha=00993298544fc0a4d873f9398b88b15f8a2e293b&name_0=PR&name_1=Fast%20test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `03100_lwu_36_json_skip_indexes` test cannot reproduce the bug because `supportsSkipIndexesOnDataRead` returns false when `hasPatchParts()` is true, and removing this guard exposes a `_part_offset` column error. Remove this test for now since the `MergeTreeSkipIndexReader::read` path is unreachable with patch parts. The `03100_lwu_48_text_index` test is redesigned to actually trigger the bug in `optimizeDirectReadFromTextIndex`. The previous test only removed matches via UPDATE (making the stale text index conservative but still correct). The new test adds matches via UPDATE so that the stale text index misses rows that should be returned: `hasToken` with `splitByNonAlpha` tokenizer uses `Exact` direct read mode, which replaces the original WHERE with a virtual column. On master without the `canUseIndex` check, the stale index returns 1 instead of 11 for count(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ip-indexes-patches-v2
e652ab4 to
48396fa
Compare
…s-v2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| #pragma once | ||
|
|
||
| #include <expected> | ||
| #include <Common/LoggingFormatStringHelpers.h> |
There was a problem hiding this comment.
Common/LoggingFormatStringHelpers.h into MergeTreeDataSelectExecutor.h, which is a high-fan-out header used across MergeTree read paths. LoggingFormatStringHelpers.h is template-heavy (fmt, compile-time checks), so pulling it into this header multiplies rebuild cost across many translation units.
Please avoid the transitive include here: forward-declare PreformattedMessage (or add a lightweight fwd header) in the .h, and keep the full LoggingFormatStringHelpers.h include in .cpp where canUseIndex is defined.
|
|
||
| SELECT count() FROM t_lwu_text_index WHERE hasToken(str, '777'); | ||
|
|
||
| -- Update str to ADD '777' token to rows in a different text index mark. |
There was a problem hiding this comment.
03100_lwu_36_json_skip_indexes.* and adds only a text-index regression test. But this change also modifies the generic skip-index filtering path in MergeTreeSkipIndexReader::read (canUseIndex gating), not only text-index direct read.
Please keep/add an equivalent non-text skip-index coverage case (e.g. bloom_filter/JSON path with patch parts and use_skip_indexes_on_data_read) so we don't lose regression protection for ordinary skip indexes.
LLVM Coverage Report
Changed lines: 94.12% (80/85) · Uncovered code |
Continuation of #88702 - rebased on current master and fixed tests.
The changes:
canUseIndex/canUseMergedIndexas static methods onMergeTreeDataSelectExecutorso they can be reused in multiple places.canUseIndexchecks inMergeTreeSkipIndexReader::readto skip indexes that depend on columns being updated on the fly by lightweight updates.canUseIndexcheck incollectTextIndexReadInfosto prevent direct reading from text index when the indexed column is updated.MergeTreeReadersChainto only read and apply patches at the step that actually has columns needing patches, not always at step 0.addPatchPartsColumnsto find the correct first step with patches instead of assuming it is always the first prewhere step.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed usage of skip indexes on data read (with enabled setting
use_skip_indexes_on_data_read) and existing patch parts created by lightweight updates.