Skip to content

Fixes row-count inconsistency in MergeTreeReaderIndex when the part’s only or final granule contains fewer rows than index_granularity.#89555

Closed
fastio wants to merge 4 commits intoClickHouse:masterfrom
fastio:bugfix-MergeTreeReaderIndexForLastGranule
Closed

Fixes row-count inconsistency in MergeTreeReaderIndex when the part’s only or final granule contains fewer rows than index_granularity.#89555
fastio wants to merge 4 commits intoClickHouse:masterfrom
fastio:bugfix-MergeTreeReaderIndexForLastGranule

Conversation

@fastio
Copy link
Copy Markdown
Contributor

@fastio fastio commented Nov 5, 2025

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixes a row-count mismatch in MergeTreeReaderIndex when the part's only or last granule has fewer rows than index_granularity. Resolves #89691

Details

This PR addresses two distinct corner cases in the read path, both related to row count inconsistencies when MergeTreeReaderIndex is the first reader in the MergeTreeReadersChain.

Case 1 — Single-granule part with fewer rows than index_granularity

When a part consists of only one granule, and its total number of rows is less than the configured index_granularity, the current implementation of MergeTreeReaderIndex unconditionally returns index_granularity rows.
The subsequent reader in the chain (e.g., MergeTreeReader) then reads the actual number of rows from the part.
This discrepancy leads to a mismatch during the continueReadingChain consistency check and triggers a LOGICAL_ERROR.

How to reproduce it?

https://fiddle.clickhouse.com/73b7c1a9-ec7f-462e-9e79-b507e1b2b948

Case 2 — The last granule with fewer rows than index_granularity

A similar issue occurs when the final granule of a part has fewer rows than index_granularity.
In this case as well, MergeTreeReaderIndex reports index_granularity rows, while the following reader produces the true row count, again causing an internal row-count mismatch and resulting in a LOGICAL_ERROR.

How to reproduce it?

https://fiddle.clickhouse.com/5f030c29-feaa-4f47-89c3-6c3beafbbfc7

Resolves #89691

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Nov 5, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Nov 5, 2025

Workflow [PR], commit [a1cf4a8]

Summary:

job_name test_name status info comment
Fast test failure
02233_set_enable_with_statement_cte_perf FAIL cidb
03275_matview_with_union FAIL cidb
03166_skip_indexes_vertical_merge_1 FAIL cidb
01710_projections_partial_optimize_aggregation_in_order FAIL cidb
03408_limit_by_rows_before_limit FAIL cidb
02131_used_row_policies_in_query_log FAIL cidb
03100_lwu_27_update_after_on_fly_mutations FAIL cidb
02751_parallel_replicas_bug_chunkinfo_not_set FAIL cidb
00804_test_custom_compression_codecs FAIL cidb
03008_optimize_equal_ranges FAIL cidb
490 more test cases not shown
Build (amd_debug) dropped
Build (amd_asan) dropped
Build (amd_tsan) dropped
Build (amd_msan) dropped
Build (amd_ubsan) dropped
Build (amd_binary) dropped
Build (arm_asan) dropped
Build (arm_binary) dropped
Build (arm_tsan) dropped

@clickhouse-gh clickhouse-gh bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Nov 5, 2025
@fastio fastio changed the title Fix crash when use_skip_indexes_on_data_read=1 and last granule has fewer rows Fix MergeTreeReaderIndex row count mismatch for last granule Nov 6, 2025
@Ergus Ergus self-assigned this Nov 6, 2025
Copy link
Copy Markdown
Member

@Ergus Ergus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more details about the issue and steps on how to reproduce it.

I would recommend to open a new issue describing the problem and reference it as Fixes in this PR description.

Also if you open the issue you can rename the Test as: 03534_skip_index_bug####.sql to have all the information if something similar happens again in the future or if the tests starts failing with other changes. Please check other similar tests.

-- { echo ON }

SET use_skip_indexes_on_data_read = 1;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these spaces between sets

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it.


SET merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability=0;


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style checker will complain if there are two empty lines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll fix it.

CREATE TABLE tab
(
`id` Int64,
`trace_id` FixedString(16) CODEC(ZSTD(1)),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the compression really needed to reproduce the issue?

Copy link
Copy Markdown
Contributor Author

@fastio fastio Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right — compression isn’t required to reproduce the issue. I’ve simplified the test accordingly.


insert into tab select 100, unhex('31E5C3CAFA300A8DE5A84B740E2F2DB0'), 'aaa';

SELECT id,text FROM tab WHERE trace_id = unhex('31E5C3CAFA300A8DE5A84B740E2F2DB0');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that this is intended to address two different issues, could you add a test for each? And a comment is very welcome.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense — I’ll add individual tests for each case and include a clarifying comment in the code.

}

void MergeTreeRangeReader::ReadResult::adjustLastGranule()
void MergeTreeRangeReader::ReadResult::adjustLastGranule(std::optional<size_t> actual_num_read_rows)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a Suggestion: If you use a ssize_t with -1 as default instead of an std::optional it will be simpler and lighter. In simple use cases like this, using an optional is a bit overkill IMO ;)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for your suggestions.

@fastio fastio changed the title Fix MergeTreeReaderIndex row count mismatch for last granule Fixes row-count inconsistency in MergeTreeReaderIndex when the part’s only or final granule contains fewer rows than index_granularity. Nov 7, 2025
@fastio
Copy link
Copy Markdown
Contributor Author

fastio commented Nov 7, 2025

Please add more details about the issue and steps on how to reproduce it.

I would recommend to open a new issue describing the problem and reference it as Fixes in this PR description.

Also if you open the issue you can rename the Test as: 03534_skip_index_bug####.sql to have all the information if something similar happens again in the future or if the tests starts failing with other changes. Please check other similar tests.

@Ergus Thanks a lot for the detailed review and suggestions!

I’ll open a new issue describing the problem in detail, including clear reproduction steps and context about the affected settings.
I’ll push the updates shortly.

@fastio fastio requested a review from Ergus November 17, 2025 11:49
@fastio fastio marked this pull request as draft November 17, 2025 12:14
@fastio fastio marked this pull request as ready for review November 17, 2025 13:09
Copy link
Copy Markdown
Member

@Ergus Ergus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fastio

If you created the issue related with this; please, explain it briefly in the test header and rename your new test like: 02346_text_index_bugNNNN.sql similar to other tests. So we keep all the information in the future if this test fails ;)

@@ -0,0 +1,31 @@
-- Tags: no-parallel-replicas
-- no-parallel-replicas: use_skip_indexes_on_data_read is not supported with parallel replicas.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to check this because it has changed. IIRC use_skip_indexes_on_data_read now works with parallel replicas


SET use_skip_indexes_on_data_read = 1;
SET use_skip_indexes = 1;
SET use_query_condition_cache=0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very Minor: use_query_condition_cache = 0

SET use_skip_indexes_on_data_read = 1;
SET use_skip_indexes = 1;
SET use_query_condition_cache=0;
SET merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability=0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same than above

@Ergus
Copy link
Copy Markdown
Member

Ergus commented Nov 17, 2025

Hi @fastio

Many fast tests seem to be failing. It is not obvious for me how are them related with your changes. Could you give it a look?

@fastio
Copy link
Copy Markdown
Contributor Author

fastio commented Nov 18, 2025

Hi @fastio

Many fast tests seem to be failing. It is not obvious for me how are them related with your changes. Could you give it a look?

Thanks. This PR currently fixes two separate bugs. I think it would be better to split them.
I’ll close this PR and submit separate PRs for each issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error when reading part with granule smaller than index_granularity

3 participants