Remove outdated partition when a MergeTree database has been shutdown by YiuRULE · Pull Request #8602 · ClickHouse/ClickHouse

YiuRULE · 2020-01-10T09:55:50Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

Bug Fix

Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):

Remove outdated partition when we shutdown CH or DETACH/ATTACH a table.

Detailed description (optional):

The current workflow for removing a partition on a table is something similar at :

User ask to remove a specific partition. (with a DETACH or DROP operation)
CH set the status Outdated to all the parts of the partition .
After 8 minutes (or the value set on the setting old_parts_lifetime), CH actually remove the partition on the filesystem and acknowledge the log part than the current part has been remove.

Otherwise, if we shutdown CH or we DETACH/ATTACH the table, the actual data has not been removed and could reappear when we re-attach the table.

The proposed bug-fix was to actually delete the datas when the MergeTree storage is shutdown.

(possible than we still have a trouble if CH has not been shutdown properly)

Related to :
#8584
#7415

…_partition

alexey-milovidov

Ok.

PS. Also parts won't be dropped if they are used in SELECTs (have refcount > 1). But shutdown should wait for SELECTs, so everything looks all right.

alexey-milovidov · 2020-01-13T14:47:52Z

possible than we still have a trouble if CH has not been shutdown properly

It's not that bad :)

Remove outdated partition when a MergeTree database has been shutdown

This was introduced in ClickHouse#8602. The idea was to avoid data re-appearing in ClickHouse after DROP/DETACH PARTITION. This problem was only present in MergeTree engine and I don't understand why we need to do the same in ReplicatedMergeTree. For ReplicatedMergeTree the state of truth is stored in ZK, deleting things from filesystem just introduces inconsistencies and this is the main source for errors like "No active replica has part X or covering part". The resulting problem is fixed by ClickHouse#25820, but in my opinion we would better avoid introducing the ZK/FS inconsistency in the first place. When does this inconsistency appear? Often the sequence is like this: 0. Write 2 parts to ZK [all_0_0_0, all_1_1_0] 1. A merge gets scheduled 2. New part replaces old parts [new: all_0_1_1, old: all_0_0_0, all_1_1_0] 3. Replica gets shutdown and old parts are removed from filesystem 4. Replica comes back online, metadata about all parts is still stored in ZK for this new replica. 5. Other replica after cleanup thread runs will have only [all_0_1_1] in ZK 5. User triggers a DROP_RANGE after a while (drop range is for all_0_1_9999*) 6. Each replica deletes from ZK only [all_0_1_1]. The replica that got restarted uses its in-memory state to choose nodes to delete from ZK. 7. Restart the replica again. It will now think that there are 2 parts that it lost and needs to fetch them [all_0_0_0, all_1_1_0]. `clearOldPartsAndRemoveFromZK` which is triggered from cleanup thread runs cleanup sequence correctly, it first removes things from ZK and then from filesystem. I don't see much benefit of triggering it on shutdown and would rather have it called only from a single place. --- This is a very, very edge case situation but it proves that the current "fix" (ClickHouse#25820) isn't complete. ``` create table test( v UInt64 ) engine=ReplicatedMergeTree('/clickhouse/test', 'one') order by v settings old_parts_lifetime = 30; create table test2( v UInt64 ) engine=ReplicatedMergeTree('/clickhouse/test', 'two') order by v settings old_parts_lifetime = 30; create table test3( v UInt64 ) engine=ReplicatedMergeTree('/clickhouse/test', 'three') order by v settings old_parts_lifetime = 30; insert into table test values (1), (2), (3); insert into table test values (4); optimize table test final; detach table test; detach table test2; alter table test3 drop partition tuple(); attach table test; attach table test2; ``` ``` (CONNECTED [localhost:9181]) /> ls /clickhouse/test/replicas/one/parts all_0_0_0 all_1_1_0 (CONNECTED [localhost:9181]) /> ls /clickhouse/test/replicas/two/parts all_0_0_0 all_1_1_0 (CONNECTED [localhost:9181]) /> ls /clickhouse/test/replicas/three/parts ``` ``` detach table test; attach table test; ``` `test` will now figure out that parts exist only in ZK and will issue `GET_PART` after first removing parts from ZK. `test2` will receive fetch for unknown parts and will trigger part checks itself. Because `test` doesn't have the parts anymore in ZK `test2` will mark them as LostForever. It will also not insert empty parts, because the partition is empty. `test` is left with `GET_PART` in the queue and stuck. ``` SELECT table, type, replica_name, new_part_name, last_exception FROM system.replication_queue Query id: 74c5aa00-048d-4bc1-a2ea-6f69501c11a0 Row 1: ────── table: test type: GET_PART replica_name: one new_part_name: all_0_0_0 last_exception: Code: 234. DB::Exception: No active replica has part all_0_0_0 or covering part. (NO_REPLICA_HAS_PART) (version 21.9.1.1) Row 2: ────── table: test type: GET_PART replica_name: one new_part_name: all_1_1_0 last_exception: Code: 234. DB::Exception: No active replica has part all_1_1_0 or covering part. (NO_REPLICA_HAS_PART) (version 21.9.1.1) ```

Guillaume Tassery added 4 commits January 10, 2020 10:46

Remove outdated partition on shutdown

5f8297b

remove local partition on replicated storage as well

121d651

add test

4c316e7

modify test

fba5923

alexey-milovidov added the can be tested label Jan 10, 2020

Merge branch 'master' of github.com:ClickHouse/ClickHouse into remove…

3369858

…_partition

alexey-milovidov self-requested a review January 13, 2020 14:34

Update MergeTreeData.h

192580b

alexey-milovidov approved these changes Jan 13, 2020

View reviewed changes

alexey-milovidov merged commit bde6062 into ClickHouse:master Jan 13, 2020

alesapin added pr-bugfix Pull request with bugfix, not backported by default v20.1 labels Jan 20, 2020

alesapin pushed a commit that referenced this pull request Jan 20, 2020

Merge pull request #8602 from PerformanceVision/remove_partition

f90f864

Remove outdated partition when a MergeTree database has been shutdown

alesapin pushed a commit that referenced this pull request Jan 20, 2020

Merge pull request #8602 from PerformanceVision/remove_partition

02501ff

Remove outdated partition when a MergeTree database has been shutdown

alesapin added the v19.17 label Jan 20, 2020

tavplubix added the v19.16 label Feb 7, 2020

tavplubix pushed a commit that referenced this pull request Feb 7, 2020

Merge pull request #8602 from PerformanceVision/remove_partition

896318f

Remove outdated partition when a MergeTree database has been shutdown

This was referenced Feb 10, 2020

DETACH PARTITION vs merges = duplicates #7415

Closed

DROP PARTITION return a different number of rows when ClickHouse has been restarted #8584

Closed

alexey-milovidov mentioned this pull request Aug 15, 2020

Fix race condition in DETACH and background merge #13746

Merged

nvartolomei mentioned this pull request Jul 22, 2021

Avoid deleting old parts from FS on shutdown for replicated engine #26716

Merged

den-crane mentioned this pull request Jan 17, 2022

inactive DROP (ed) PARTITION (parts) are resurrected after CH failure and restart #15742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove outdated partition when a MergeTree database has been shutdown#8602

Remove outdated partition when a MergeTree database has been shutdown#8602
alexey-milovidov merged 6 commits intoClickHouse:masterfrom
PerformanceVision:remove_partition

YiuRULE commented Jan 10, 2020

Uh oh!

alexey-milovidov left a comment •

edited

Loading

Uh oh!

alexey-milovidov commented Jan 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

YiuRULE commented Jan 10, 2020

Uh oh!

alexey-milovidov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Jan 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexey-milovidov left a comment •

edited

Loading