Skip to content

LOGICAL_ERROR: Part {} remains in ZooKeeper after DROP_RANGE {} #56037

@Algunenano

Description

@Algunenano

Report: https://s3.amazonaws.com/clickhouse-test-reports/56009/62088772d2948801fb489928a622da48ca33d3fc/upgrade_check__tsan_.html

Logical error:

2023.10.25 21:56:53.451524 [ 1387 ] {} <Error> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(LogEntryPtr &) const: Code: 49. DB::Exception: Part all_15_15_0 remains in ZooKeeper after DROP_RANGE all_0_15_999999999_999999999. (LOGICAL_ERROR), Stack trace (when copying this message, always include the lines below):

0. ./build_docker/./src/Common/Exception.cpp:98: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c741d97 in /usr/lib/debug/usr/bin/clickhouse.debug
1. DB::Exception::Exception<String const&, String const&>(int, FormatStringHelperImpl<std::type_identity<String const&>::type, std::type_identity<String const&>::type>, String const&, String const&) @ 0x000000000765b5c7 in /usr/lib/debug/usr/bin/clickhouse.debug
2. ./build_docker/./src/Storages/StorageReplicatedMergeTree.cpp:2058: DB::paranoidCheckForCoveredPartsInZooKeeper(std::shared_ptr<zkutil::ZooKeeper> const&, String const&, StrongTypedef<unsigned int, DB::MergeTreeDataFormatVersionTag>, String const&) @ 0x000000001297afcd in /usr/lib/debug/usr/bin/clickhouse.debug
3. ./build_docker/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:701: DB::StorageReplicatedMergeTree::executeDropRange(DB::ReplicatedMergeTreeLogEntry const&) @ 0x0000000012952a72 in /usr/lib/debug/usr/bin/clickhouse.debug
4. ./build_docker/./src/Storages/StorageReplicatedMergeTree.cpp:0: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x0000000012949fa6 in /usr/lib/debug/usr/bin/clickhouse.debug
5. ./build_docker/./contrib/llvm-project/libcxx/include/__functional/function.h:716: bool std::__function::__policy_invoker<bool (std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>::__call_impl<std::__function::__default_alloc_func<DB::StorageReplicatedMergeTree::processQueueEntry(std::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_1, bool (std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>>(std::__function::__policy_storage const*, std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) @ 0x0000000012a5285f in /usr/lib/debug/usr/bin/clickhouse.debug
6. ./build_docker/./contrib/llvm-project/libcxx/include/__functional/function.h:0: DB::ReplicatedMergeTreeQueue::processEntry(std::function<std::shared_ptr<zkutil::ZooKeeper> ()>, std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&, std::function<bool (std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>) @ 0x00000000130b6e46 in /usr/lib/debug/usr/bin/clickhouse.debug
7. ./build_docker/./src/Storages/StorageReplicatedMergeTree.cpp:3275: DB::StorageReplicatedMergeTree::processQueueEntry(std::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>) @ 0x0000000012993efc in /usr/lib/debug/usr/bin/clickhouse.debug
8. ./build_docker/./src/Storages/MergeTree/IExecutableTask.h:61: DB::ExecutableLambdaAdapter::executeStep() @ 0x0000000012a533b2 in /usr/lib/debug/usr/bin/clickhouse.debug
9. ./build_docker/./src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp:284: DB::MergeTreeBackgroundExecutor<DB::RoundRobinRuntimeQueue>::threadFunction() @ 0x0000000012da88d7 in /usr/lib/debug/usr/bin/clickhouse.debug
10. ./build_docker/./base/base/../base/wide_integer_impl.h:809: ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0x000000000c828e7f in /usr/lib/debug/usr/bin/clickhouse.debug
11. ./build_docker/./src/Common/ThreadPool.cpp:0: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000c82c99c in /usr/lib/debug/usr/bin/clickhouse.debug
12. ./build_docker/./base/base/../base/wide_integer_impl.h:809: void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000c82b1c7 in /usr/lib/debug/usr/bin/clickhouse.debug
13. ? @ 0x00007f0806d14ac3 in ?
14. ? @ 0x00007f0806da6a40 in ?
 (version 23.9.2.56 (official build))

It seems the parts were just attached:

2023.10.25 21:56:52.366189 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> executeQuery: (from [::1]:32968) (comment: 02443_detach_attach_partition.sh) ALTER TABLE alter_table1 ATTACH PARTITION ID 'all'; (stage: Complete)
2023.10.25 21:56:52.366267 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> ContextAccess (default): Access granted: INSERT ON test_bcyivuz0.alter_table1
2023.10.25 21:56:52.366295 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Will wait for outdated data parts to be loaded
2023.10.25 21:56:52.366303 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Finished waiting for outdated data parts to be loaded
2023.10.25 21:56:52.366310 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Looking for parts for partition all in detached/
2023.10.25 21:56:52.366364 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Found part all_4_4_0
2023.10.25 21:56:52.366370 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Found part all_5_5_0
2023.10.25 21:56:52.366376 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): 2 of them are active
2023.10.25 21:56:52.366381 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Found containing part all_4_4_0 for part all_4_4_0
2023.10.25 21:56:52.366387 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Found containing part all_5_5_0 for part all_5_5_0
2023.10.25 21:56:52.366460 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Checking 2 parts
2023.10.25 21:56:52.366465 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Checking part attaching_all_4_4_0
2023.10.25 21:56:52.477685 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Checking part attaching_all_5_5_0
2023.10.25 21:56:52.596769 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Renaming temporary part detached/attaching_all_4_4_0 to all_13_13_0 with tid (1, 1, 00000000-0000-0000-0000-000000000000).
2023.10.25 21:56:52.596902 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Trying to create zookeeper persistent lock /clickhouse/zero_copy/zero_copy_s3/afa3f230-c69d-4f75-945b-f6cbf07decc0/all_13_13_0/test_ofe_wlmhgofcxrmgjuiohqolobwqxheqn/r2 with hardlinks []
2023.10.25 21:56:52.616335 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Zookeeper persistent lock /clickhouse/zero_copy/zero_copy_s3/afa3f230-c69d-4f75-945b-f6cbf07decc0/all_13_13_0/test_ofe_wlmhgofcxrmgjuiohqolobwqxheqn/r2 created
2023.10.25 21:56:52.656722 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Information> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094) (Replicated OutputStream): Block with ID all_B74190DCFE9D27A082DDACADAF0D7F1E already exists (it was just appeared). Renaming part all_13_13_0 back to detached/attaching_all_4_4_0. Will retry write.
2023.10.25 21:56:52.656743 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Undoing transaction. Rollbacking parts state to temporary and removing from working set: all_13_13_0.
2023.10.25 21:56:52.693374 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> createEphemeralLockInZooKeeper: Deduplication path already exists: deduplication_path=/clickhouse/tables/02443_detach_attach_partition_test_bcyivuz0/alter_table/blocks/all_B74190DCFE9D27A082DDACADAF0D7F1E
2023.10.25 21:56:52.696336 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Information> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094) (Replicated OutputStream): Block with ID all_B74190DCFE9D27A082DDACADAF0D7F1E already exists on other replicas as part all_12_12_0; will write it locally with that name.
2023.10.25 21:56:52.709639 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094) (Replicated OutputStream): Will check part all_12_12_0 checksums
2023.10.25 21:56:52.745089 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Renaming temporary part attaching_all_4_4_0 to all_12_12_0 with tid (1, 1, 00000000-0000-0000-0000-000000000000).
2023.10.25 21:56:52.756281 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Trying to create zookeeper persistent lock /clickhouse/zero_copy/zero_copy_s3/afa3f230-c69d-4f75-945b-f6cbf07decc0/all_12_12_0/test_ofe_wlmhgofcxrmgjuiohqolobwqxheqn/r2 with hardlinks []
2023.10.25 21:56:52.773628 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Zookeeper persistent lock /clickhouse/zero_copy/zero_copy_s3/afa3f230-c69d-4f75-945b-f6cbf07decc0/all_12_12_0/test_ofe_wlmhgofcxrmgjuiohqolobwqxheqn/r2 created
2023.10.25 21:56:52.809090 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Attached part all_4_4_0 as all_12_12_0
2023.10.25 21:56:52.812004 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> createEphemeralLockInZooKeeper: Deduplication path already exists: deduplication_path=/clickhouse/tables/02443_detach_attach_partition_test_bcyivuz0/alter_table/blocks/all_CFE4743C4144A7CD2129FEF615DD9959
2023.10.25 21:56:52.815775 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Information> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094) (Replicated OutputStream): Block with ID all_CFE4743C4144A7CD2129FEF615DD9959 already exists on other replicas as part all_15_15_0; will write it locally with that name.
2023.10.25 21:56:52.819235 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094) (Replicated OutputStream): Will check part all_15_15_0 checksums
2023.10.25 21:56:52.824870 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Renaming temporary part detached/attaching_all_5_5_0 to all_15_15_0 with tid (1, 1, 00000000-0000-0000-0000-000000000000).
2023.10.25 21:56:52.861429 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Trying to create zookeeper persistent lock /clickhouse/zero_copy/zero_copy_s3/afa3f230-c69d-4f75-945b-f6cbf07decc0/all_15_15_0/test_ame_thjtvjnjrsnkwljaeeoryakwmahfe/r2 with hardlinks []
2023.10.25 21:56:52.880415 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Trace> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Zookeeper persistent lock /clickhouse/zero_copy/zero_copy_s3/afa3f230-c69d-4f75-945b-f6cbf07decc0/all_15_15_0/test_ame_thjtvjnjrsnkwljaeeoryakwmahfe/r2 created
2023.10.25 21:56:52.907594 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> test_bcyivuz0.alter_table1 (a8336085-c547-4dac-8e13-cbea2e4e0094): Attached part all_5_5_0 as all_15_15_0
2023.10.25 21:56:52.922436 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> MemoryTracker: Peak memory usage (for query): 3.06 MiB.
2023.10.25 21:56:52.922449 [ 3022 ] {7e1a0781-182b-4473-912e-a30f11ce6584} <Debug> TCPHandler: Processed in 0.571604992 sec.

It seems like a "race" condition between the background thread doing the drop (and doing the paranoid checks) and a query reattaching partitions.

Metadata

Metadata

Assignees

Labels

fuzzProblem found by one of the fuzzers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions