Fix broken check_callback in test_restore_db_replica by alexey-milovidov · Pull Request #96968 · ClickHouse/ClickHouse

alexey-milovidov · 2026-02-15T05:34:48Z

Summary

Fix the check_callback in test_query_after_restore_db_replica that compared raw query result (with trailing \n) against the expected table name (without \n), causing the callback to always return False and making retries useless
Increase retry_count from 10 to 30 to give more time for the Replicated database to initialize after node restart, especially in ASan builds

Closes #85664

Test plan

Existing integration tests pass: test_restore_db_replica/test.py::test_query_after_restore_db_replica
Verify with ASan build that the flaky test no longer fails

Changelog category (leave one):

CI Fix or Improvement (changelog entry is not required)

🤖 Generated with Claude Code

…em.session_log` The `DELETE FROM system.session_log` cleanup at the start of test `02835_drop_user_during_session` creates a synchronous lightweight delete mutation. In ASan stress tests, `system.session_log` receives continuous writes from all concurrent test sessions, causing the mutation to take over 12 minutes. Even though the query gets cancelled (`is_cancelled: 1`), it remains stuck in `StorageMergeTree::waitForMutation`. Fix by making the cleanup DELETE non-blocking with `lightweight_deletes_sync = 0`. This is safe because the test user name includes the PID (`$$`), making collisions from PID reuse extremely unlikely during a single stress test run. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=96926&sha=724c57e90613f66e2ef897f80a644ba21e98b470&name_0=PR&name_1=Stress%20test%20%28arm_asan%2C%20s3%29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

There was a TOCTOU race in the worker thread loop: after `tryPop` timed out (confirming the queue was empty), a context switch could allow the main thread to push the final block and call `finish()` before the worker checked `isFinished()`. The worker would then see the queue as finished and exit, leaving the last block unprocessed. This caused occasional loss of ~2000-3000 rows (one shard's portion of the last source block) in sharded dictionaries loaded in parallel. The fix replaces `isFinished()` with `isFinishedAndEmpty()`, which atomically verifies both that the queue is marked finished AND that no items remain. If items were pushed between the `tryPop` timeout and this check, the worker correctly retries and processes them. Closes #84468 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `check_callback` compared the raw query result (with trailing newline) against the expected table name (without newline), so the callback always returned False. This made `query_with_retry` exhaust all retries uselessly instead of returning early when the table appeared. Also increase `retry_count` from 10 to 30 to give more time for the Replicated database to initialize after node restart, especially in ASan builds. #85664 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

clickhouse-gh · 2026-02-15T05:35:22Z

Workflow [PR], commit [6272f5c]

Summary: ✅

alexey-milovidov and others added 3 commits February 15, 2026 03:10

clickhouse-gh bot added the pr-ci label Feb 15, 2026

alexey-milovidov self-assigned this Feb 15, 2026

alexey-milovidov merged commit 804ae8a into master Feb 15, 2026
133 of 134 checks passed

alexey-milovidov deleted the fix-test-restore-db-replica-callback branch February 15, 2026 14:34

robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken check_callback in test_restore_db_replica#96968

Fix broken check_callback in test_restore_db_replica#96968
alexey-milovidov merged 3 commits intomasterfrom
fix-test-restore-db-replica-callback

alexey-milovidov commented Feb 15, 2026

Uh oh!

clickhouse-gh bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexey-milovidov commented Feb 15, 2026

Summary

Test plan

Changelog category (leave one):

Uh oh!

clickhouse-gh bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clickhouse-gh bot commented Feb 15, 2026 •

edited

Loading