Skip to content

Recover the Replicated Database forcefully after restoring database metadata in Keeper#85960

Merged
tuanpach merged 1 commit intoClickHouse:masterfrom
tuanpach:fix-issue-85664
Aug 22, 2025
Merged

Recover the Replicated Database forcefully after restoring database metadata in Keeper#85960
tuanpach merged 1 commit intoClickHouse:masterfrom
tuanpach:fix-issue-85664

Conversation

@tuanpach
Copy link
Copy Markdown
Member

@tuanpach tuanpach commented Aug 21, 2025

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Recover the Replicated Database forcefully after restoring the database metadata in Keeper.

The issue in #85664 is that when restoring metadata, it sets the digest of the replica to "0" in DatabaseReplicated::createReplicaNodesInZooKeeper.
If another node restores the table metadata, it will just reinitialize the DDL Worker.
When restarting, the DB might not has any tables locally, and the local digest is 0, it matches to the keeper digest, so it won't update the restored metadata.

In this PR, after restoring database metadata in Keeper, before reinitializing the DDL Worker, set the replica digest to 42 to force the database to recover to update the restored metadata.

Closes #85664

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@tuanpach tuanpach added the can be tested Allows running workflows for external contributors label Aug 21, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Aug 21, 2025

Workflow [PR], commit [22dba5d]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel) failure
02443_detach_attach_partition FAIL
03595_alter_drop_column_comment_if_exists FAIL
Lost s3 keys FAIL
S3_ERROR No such key thrown (in clickhouse-server.log or clickhouse-server.err.log) FAIL
Integration tests (amd_tsan, 5/6) failure
test_threadpool_readers/test.py::test_local_fs_threadpool_reader FAIL

@clickhouse-gh clickhouse-gh bot added the pr-ci label Aug 21, 2025
@evillique evillique self-assigned this Aug 21, 2025
@evillique
Copy link
Copy Markdown
Member

The changelog category should probably be bugfix, if I understand it correctly CI fix is for fixes in the CI infrastructure.

@tuanpach
Copy link
Copy Markdown
Member Author

The changelog category should probably be bugfix, if I understand it correctly CI fix is for fixes in the CI infrastructure.

I thought it also fixes the CI failed tests: https://github.com/ClickHouse/ClickHouse/issues?q=state%3Aclosed%20label%3Apr-ci

@tuanpach
Copy link
Copy Markdown
Member Author

@tuanpach tuanpach added this pull request to the merge queue Aug 22, 2025
Merged via the queue into ClickHouse:master with commit e21a13b Aug 22, 2025
119 of 122 checks passed
@tuanpach tuanpach deleted the fix-issue-85664 branch August 22, 2025 08:17
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 22, 2025
@evillique
Copy link
Copy Markdown
Member

I thought it also fixes the CI failed tests: https://github.com/ClickHouse/ClickHouse/issues?q=state%3Aclosed%20label%3Apr-ci

Well, in this list I see the changes to CI itself and test fixes insofar as fixing the test itself, I don't see any bugfixes that change our main code. And if I understand correctly this PR fixes a real bug found in the test, but not the test itself.

@tuanpach tuanpach added pr-bugfix Pull request with bugfix, not backported by default and removed pr-ci labels Aug 25, 2025
@tuanpach
Copy link
Copy Markdown
Member Author

I thought it also fixes the CI failed tests: https://github.com/ClickHouse/ClickHouse/issues?q=state%3Aclosed%20label%3Apr-ci

Well, in this list I see the changes to CI itself and test fixes insofar as fixing the test itself, I don't see any bugfixes that change our main code. And if I understand correctly this PR fixes a real bug found in the test, but not the test itself.

I updated the category.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_restore_db_replica/test.py::test_query_after_restore_db_replica is flaky

3 participants