Skip to content

ReplicatedMergeTree - cleanup data directory after Zookeeper exceptions#14563

Merged
tavplubix merged 9 commits intoClickHouse:masterfrom
bharatnc:ncb/zk-conn-fail-do-cleanup
Sep 10, 2020
Merged

ReplicatedMergeTree - cleanup data directory after Zookeeper exceptions#14563
tavplubix merged 9 commits intoClickHouse:masterfrom
bharatnc:ncb/zk-conn-fail-do-cleanup

Conversation

@bharatnc
Copy link
Copy Markdown
Contributor

@bharatnc bharatnc commented Sep 8, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Cleanup data directory after Zookeeper exceptions during CreateQuery for StorageReplicatedMergeTree Engine.

Detailed description / Documentation draft:

It's possible for getZooKeeper() to timeout if zookeeper host(s) can't
be reached. In such cases Poco::Exception is thrown after a connection
timeout - refer to src/Common/ZooKeeper/ZooKeeperImpl.cpp:866 for more info.

Side effect of this is that the CreateQuery gets interrupted and it exits.
But the data Directories for the tables being created aren't cleaned up.
This unclean state will hinder table creation on any retries and will
complain that the Directory for table already exists.

To achieve a clean state on failed table creations, catch this error if
the exception is of type Poco::Exception and call dropIfEmpty() method,
then proceed throwing the exception. Without this, the Directory for the
tables need to be manually deleted before retrying the CreateQuery.

This addresses: #14517

It's possible for `getZooKeeper()` to timeout if  zookeeper host(s) can't
be reached. In such cases `Poco::Exception` is thrown after a connection
timeout - refer to `src/Common/ZooKeeper/ZooKeeperImpl.cpp:866` for more info.

Side effect of this is that the CreateQuery gets interrupted and it exits.
But the data Directories for the tables being created aren't cleaned up.
This unclean state will hinder table creation on any retries and will
complain that the Directory for table already exists.

To achieve a clean state on failed table creations, catch this error if
the exception is of type Poco::Exception and call `dropIfEmpty()` method,
then proceed throwing the exception. Without this, the Directory for the
tables need to be manually deleted before retrying the CreateQuery.
@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Sep 8, 2020
@bharatnc bharatnc changed the title StorageReplicatedMergeTree - cleanup data directory after Zookeeper exceptions ReplicatedMergeTree - cleanup data directory after Zookeeper exceptions Sep 8, 2020
This adds a integration test that tests if table directory is cleaned
up after a ZooKeeper connection failure for ReplicatedMergeTree tables.
@tavplubix tavplubix self-assigned this Sep 8, 2020
@bharatnc
Copy link
Copy Markdown
Contributor Author

bharatnc commented Sep 9, 2020

Also addresses #14115

@bharatnc
Copy link
Copy Markdown
Contributor Author

bharatnc commented Sep 9, 2020

@tavplubix thank you for adding those tests. Should this PR be considered a Bug. I had initially created it under Improvement - but makes more sense to be a bug. I did update the PR description, but label needs to be changed. Do let me know if this should rather be anImprovement.

@tavplubix
Copy link
Copy Markdown
Member

@tavplubix thank you for adding those tests. Should this PR be considered a Bug. I had initially created it under Improvement - but makes more sense to be a bug. I did update the PR description, but label needs to be changed. Do let me know if this should rather be anImprovement.

Ok, let's change the changelog category to "Bug Fix", so it will be backported

@robot-clickhouse robot-clickhouse added pr-bugfix Pull request with bugfix, not backported by default and removed pr-improvement Pull request with some product improvements labels Sep 10, 2020
robot-clickhouse pushed a commit that referenced this pull request Sep 10, 2020
robot-clickhouse pushed a commit that referenced this pull request Sep 10, 2020
tavplubix added a commit that referenced this pull request Sep 11, 2020
Backport #14563 to 20.6: ReplicatedMergeTree - cleanup data directory after Zookeeper exceptions
tavplubix added a commit that referenced this pull request Sep 11, 2020
Backport #14563 to 20.8: ReplicatedMergeTree - cleanup data directory after Zookeeper exceptions
tavplubix added a commit that referenced this pull request Sep 11, 2020
Backport #14563 to 20.9: ReplicatedMergeTree - cleanup data directory after Zookeeper exceptions
tavplubix added a commit that referenced this pull request Sep 11, 2020
Backport #14563 to 20.7: ReplicatedMergeTree - cleanup data directory after Zookeeper exceptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-docs-needed pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants