Fix DatabaseReplicated metadata digest mismatch during SYSTEM RESTART REPLICA#96718
Merged
alexey-milovidov merged 3 commits intomasterfrom Feb 15, 2026
Merged
Conversation
…ART REPLICA` `SYSTEM RESTART REPLICA` calls `detachTable` and then `attachTable` on a `DatabaseReplicated`, but neither method was overridden to update `tables_metadata_digest`. This caused the in-memory digest to become inconsistent with the `tables` map: during the window between detach and re-attach, if a concurrent DDL worker operation called `checkDigestValid`, the local digest (computed from iterating `tables`) would not match `tables_metadata_digest` (which still included the hash of the detached table), resulting in a "Digest does not match" LOGICAL_ERROR exception. The fix overrides `detachTable` and `attachTable` in `DatabaseReplicated` to update `tables_metadata_digest` under `metadata_mutex`. A `digest_initialized` flag prevents modifying the digest during startup, when tables are attached before the digest is computed from scratch. The `metadata_mutex` acquisition also eliminates the race window: while a DDL worker operation holds `metadata_mutex`, `SYSTEM RESTART REPLICA` cannot proceed with `detachTable`/`attachTable`. Closes #77233 CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=cfb29a38cf387ff4253f8f74d54a353676aefeaa&name_0=MasterCI&name_1=Stress%20test%20%28amd_ubsan%29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
Workflow [PR], commit [6e5d03a] Summary: ❌
|
tuanpach
reviewed
Feb 12, 2026
src/Databases/DatabaseReplicated.cpp
Outdated
|
|
||
| StoragePtr DatabaseReplicated::detachTable(ContextPtr local_context, const String & table_name) | ||
| { | ||
| /// SYSTEM RESTART REPLICA calls detachTable and then attachTable without updating the digest. |
Member
There was a problem hiding this comment.
In detachTablePermanently, it waits for the DB to completely start with waitDatabaseStarted.
Should we also wait here? So we don't need digest_initialized to synchronize tables_metadata_digest.
`DatabaseReplicated::detachTablePermanently` holds `metadata_mutex` and calls `DatabaseAtomic::detachTablePermanently` (inherited from `DatabaseOnDisk`), which calls virtual `this->detachTable()`. The new `detachTable` override also tries to lock `metadata_mutex`, causing a deadlock on the non-recursive mutex. Fix by calling `DatabaseAtomic::detachTable` directly (bypassing virtual dispatch) and inlining the permanently-detached flag logic from `DatabaseOnDisk::detachTablePermanently`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous approach of overriding detachTable/attachTable in DatabaseReplicated to update tables_metadata_digest had a double-counting bug: when ATTACH TABLE (short syntax) is used, removeDetachedPermanentlyFlag(attach=true) adds the hash AND the attachTable override adds it again. Additionally, detachTablePermanently needed an inlined copy of DatabaseOnDisk::detachTablePermanently to avoid a deadlock, which was fragile. Instead, use a simpler approach: add a tables_being_restarted atomic counter and a RAII RestartReplicaGuard that suppresses checkDigestValid during the SYSTEM RESTART REPLICA window. This is analogous to the existing is_recovering flag that skips digest checks during replica recovery. The key insight is that tables_metadata_digest stays correct throughout the restart (it reflects the full set of tables) — only the in-memory tables map is temporarily inconsistent. Since the restarted table is locked exclusively, no DDL operation can target it concurrently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
Author
|
The last implementation is quite good. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SYSTEM RESTART REPLICAtemporarily detaches a table from the in-memorytablesmap and re-attaches it, without updatingtables_metadata_digest. This creates a window where a concurrent DDL operation can callcheckDigestValid(in debug/sanitizer builds) and observe thetablesmap missing a table whiletables_metadata_digeststill includes its hash, causing a false "Digest does not match"LOGICAL_ERRORexception.The fix adds an atomic counter
tables_being_restartedand a RAIIRestartReplicaGuardinInterpreterSystemQuery::doRestartReplica. While any table restart is in progress,checkDigestValidreturnstruewithout checking, similar to the existingis_recoveringskip. Thetables_metadata_digestvalue stays correct throughout (it reflects the full set of tables) — only the in-memorytablesmap is temporarily inconsistent.Closes #77233
CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=cfb29a38cf387ff4253f8f74d54a353676aefeaa&name_0=MasterCI&name_1=Stress%20test%20%28amd_ubsan%29
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
🤖 Generated with Claude Code