Skip to content

add assertion that checksums.txt is consistent#76625

Merged
CheSema merged 37 commits intomasterfrom
chesema-fix-sparse
Jul 15, 2025
Merged

add assertion that checksums.txt is consistent#76625
CheSema merged 37 commits intomasterfrom
chesema-fix-sparse

Conversation

@CheSema
Copy link
Copy Markdown
Member

@CheSema CheSema commented Feb 22, 2025

It was possible to build wrong checksums.txt file during a sophisticated mutation or alter command.
After that the server might not be able to load it after restart or unable to remove that part when needed.

Here I add a check which runs for each committed part. It ensures that checksums.txt file matched the actual files on the disk. Otherwise transaction is aborted and part is not added to active parts set.

Along that I fixed all cases which have been triggered by that new check in our CI.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Verify the part has consistent checksum.txt file right before committing it.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 22, 2025

Workflow [PR], commit [29d4585]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Feb 22, 2025
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Feb 24, 2025

Fast tests showed that checksums file contains excess records about dropped projections.
Tried to fix that.

@CheSema CheSema force-pushed the chesema-fix-sparse branch 10 times, most recently from 420835f to 1bf010c Compare March 3, 2025 13:24
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 6, 2025

Dear @CheSema, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

@CheSema CheSema force-pushed the chesema-fix-sparse branch from 1bf010c to 29d4585 Compare May 22, 2025 09:57
@CheSema CheSema force-pushed the chesema-fix-sparse branch from 29d4585 to 5620f7c Compare July 1, 2025 10:18
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jul 1, 2025

Workflow [PR], commit [37e000e]

Summary:

job_name test_name status info comment
Integration tests (aarch64, distributed plan, 3/4) failure
test_storage_rabbitmq/test.py::test_rabbitmq_select[1] FAIL

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Jul 2, 2025

@CheSema CheSema force-pushed the chesema-fix-sparse branch 2 times, most recently from 3e1e422 to 739286a Compare July 4, 2025 12:26
@clickhouse-gh clickhouse-gh bot removed the pr-not-for-changelog This PR should not be mentioned in the changelog label Jul 15, 2025
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Jul 15, 2025

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=76625&sha=494242a986061b072eeadc7ddd1934e0be47c8fa&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%29

2025.07.15 04:50:30.751218 [ 2219 ] {62da41d4-c004-499c-a825-b808941ac574::all_1_1_0_2} <Fatal> : Logical error: 'Column `c` has rows count 0 according to size in memory and size of single value, but data part all_1_1_0_2 has 1 rows'.
2025.07.15 04:50:30.770719 [ 2219 ] {62da41d4-c004-499c-a825-b808941ac574::all_1_1_0_2} <Fatal> : Stack trace (when copying this message, always include the lines below):

0. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__exception/exception.h:113: Poco::Exception::Exception(String const&, int) @ 0x0000000025bdc3d2
1. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Common/Exception.cpp:115: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013d47f66
2. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Common/Exception.h:119: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000cd8f4cc
3. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Common/Exception.h:137: DB::Exception::Exception<String, unsigned long&, String const&, unsigned long const&>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<unsigned long&>::type, std::type_identity<String const&>::type, std::type_identity<unsigned long const&>::type>, String&&, unsigned long&, String const&, unsigned long const&) @ 0x000000001d85c0b6
4. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeDataPartWide.cpp:446: DB::MergeTreeDataPartWide::calculateEachColumnSizes(std::unordered_map<String, DB::ColumnSize, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, DB::ColumnSize>>>&, DB::ColumnSize&, std::optional<DB::Block>) const @ 0x000000001d859faf
5. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/IMergeTreeDataPart.cpp:2361: DB::IMergeTreeDataPart::calculateColumnsSizesOnDisk(std::optional<DB::Block>) const @ 0x000000001d699bc3
6. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/IMergeTreeDataPart.cpp:2351: DB::IMergeTreeDataPart::calculateColumnsAndSecondaryIndicesSizesOnDisk(std::optional<DB::Block>) const @ 0x000000001d68a54f
7. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/IMergeTreeDataPart.cpp:2403: DB::IMergeTreeDataPart::getColumnSize(String const&) const @ 0x000000001d685fd0
8. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeData.cpp:5869: DB::MergeTreeData::addPartContributionToColumnAndSecondaryIndexSizesUnlocked(std::shared_ptr<DB::IMergeTreeDataPart const> const&) const @ 0x000000001d7d1ad2
9. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeData.cpp:5861: DB::MergeTreeData::Transaction::commit(DB::DataPartsLock*) @ 0x000000001d795e1b
10. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MutatePlainMergeTreeTask.cpp:112: DB::MutatePlainMergeTreeTask::executeStep() @ 0x000000001d9e58bc
11. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp:318: DB::MergeTreeBackgroundExecutor<DB::DynamicRuntimeQueue>::routine(std::shared_ptr<DB::TaskRuntimeData>) @ 0x000000001d732a1b
12. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp:363: DB::MergeTreeBackgroundExecutor<DB::DynamicRuntimeQueue>::threadFunction() @ 0x000000001d733397
13. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__functional/function.h:716: ? @ 0x0000000013e81373
14. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:117: ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'()::operator()() @ 0x0000000013e874a6
15. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__functional/function.h:716: ? @ 0x0000000013e7e986
16. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:117: void* std::__thread_proxy[abi:se190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x0000000013e84f60
17. ? @ 0x0000000000094ac3
18. ? @ 0x0000000000126850

That diаа was useful
f54f818

Copy link
Copy Markdown
Member

@Michicosun Michicosun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in private

@CheSema CheSema added this pull request to the merge queue Jul 15, 2025
Merged via the queue into master with commit 1c01fee Jul 15, 2025
119 of 122 checks passed
@CheSema CheSema deleted the chesema-fix-sparse branch July 15, 2025 13:13
@CheSema CheSema added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Jul 15, 2025
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Jul 15, 2025
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 15, 2025
@robot-ch-test-poll robot-ch-test-poll added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jul 15, 2025
@CheSema CheSema removed the pr-backports-created-cloud deprecated label, NOOP label Jul 16, 2025
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Jul 16, 2025

Follow up #83830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-improvement Pull request with some product improvements pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants