Skip to content

Fix DatabaseReplicated metadata digest mismatch during SYSTEM RESTART REPLICA#96718

Merged
alexey-milovidov merged 3 commits intomasterfrom
fix-replicated-database-digest-mismatch
Feb 15, 2026
Merged

Fix DatabaseReplicated metadata digest mismatch during SYSTEM RESTART REPLICA#96718
alexey-milovidov merged 3 commits intomasterfrom
fix-replicated-database-digest-mismatch

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Feb 12, 2026

Summary

SYSTEM RESTART REPLICA temporarily detaches a table from the in-memory tables map and re-attaches it, without updating tables_metadata_digest. This creates a window where a concurrent DDL operation can call checkDigestValid (in debug/sanitizer builds) and observe the tables map missing a table while tables_metadata_digest still includes its hash, causing a false "Digest does not match" LOGICAL_ERROR exception.

The fix adds an atomic counter tables_being_restarted and a RAII RestartReplicaGuard in InterpreterSystemQuery::doRestartReplica. While any table restart is in progress, checkDigestValid returns true without checking, similar to the existing is_recovering skip. The tables_metadata_digest value stays correct throughout (it reflects the full set of tables) — only the in-memory tables map is temporarily inconsistent.

Closes #77233

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=cfb29a38cf387ff4253f8f74d54a353676aefeaa&name_0=MasterCI&name_1=Stress%20test%20%28amd_ubsan%29

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

🤖 Generated with Claude Code

…ART REPLICA`

`SYSTEM RESTART REPLICA` calls `detachTable` and then `attachTable` on a
`DatabaseReplicated`, but neither method was overridden to update
`tables_metadata_digest`. This caused the in-memory digest to become
inconsistent with the `tables` map: during the window between detach and
re-attach, if a concurrent DDL worker operation called `checkDigestValid`,
the local digest (computed from iterating `tables`) would not match
`tables_metadata_digest` (which still included the hash of the detached
table), resulting in a "Digest does not match" LOGICAL_ERROR exception.

The fix overrides `detachTable` and `attachTable` in `DatabaseReplicated`
to update `tables_metadata_digest` under `metadata_mutex`. A
`digest_initialized` flag prevents modifying the digest during startup,
when tables are attached before the digest is computed from scratch.

The `metadata_mutex` acquisition also eliminates the race window: while
a DDL worker operation holds `metadata_mutex`, `SYSTEM RESTART REPLICA`
cannot proceed with `detachTable`/`attachTable`.

Closes #77233

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=cfb29a38cf387ff4253f8f74d54a353676aefeaa&name_0=MasterCI&name_1=Stress%20test%20%28amd_ubsan%29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 12, 2026

Workflow [PR], commit [6e5d03a]

Summary:

job_name test_name status info comment
Integration tests (amd_binary, 2/5) failure
test_zero_copy_expand_macros/test.py::test_drop_after_fetch FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-ci label Feb 12, 2026

StoragePtr DatabaseReplicated::detachTable(ContextPtr local_context, const String & table_name)
{
/// SYSTEM RESTART REPLICA calls detachTable and then attachTable without updating the digest.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In detachTablePermanently, it waits for the DB to completely start with waitDatabaseStarted.

Should we also wait here? So we don't need digest_initialized to synchronize tables_metadata_digest.

alexey-milovidov and others added 2 commits February 12, 2026 11:38
`DatabaseReplicated::detachTablePermanently` holds `metadata_mutex`
and calls `DatabaseAtomic::detachTablePermanently` (inherited from
`DatabaseOnDisk`), which calls virtual `this->detachTable()`.
The new `detachTable` override also tries to lock `metadata_mutex`,
causing a deadlock on the non-recursive mutex.

Fix by calling `DatabaseAtomic::detachTable` directly (bypassing
virtual dispatch) and inlining the permanently-detached flag logic
from `DatabaseOnDisk::detachTablePermanently`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous approach of overriding detachTable/attachTable in
DatabaseReplicated to update tables_metadata_digest had a
double-counting bug: when ATTACH TABLE (short syntax) is used,
removeDetachedPermanentlyFlag(attach=true) adds the hash AND the
attachTable override adds it again. Additionally, detachTablePermanently
needed an inlined copy of DatabaseOnDisk::detachTablePermanently to avoid
a deadlock, which was fragile.

Instead, use a simpler approach: add a tables_being_restarted atomic
counter and a RAII RestartReplicaGuard that suppresses checkDigestValid
during the SYSTEM RESTART REPLICA window. This is analogous to the existing
is_recovering flag that skips digest checks during replica recovery.

The key insight is that tables_metadata_digest stays correct throughout
the restart (it reflects the full set of tables) — only the in-memory
tables map is temporarily inconsistent. Since the restarted table is
locked exclusively, no DDL operation can target it concurrently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

The last implementation is quite good.

@alexey-milovidov alexey-milovidov self-assigned this Feb 15, 2026
@alexey-milovidov alexey-milovidov merged commit d2ed4ec into master Feb 15, 2026
131 of 135 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-replicated-database-digest-mismatch branch February 15, 2026 00:39
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: 'Digest does not match'.

3 participants