fix(#4458): eliminate TS compaction/append deadlock causing WAL version gap on Raft followers#4460
Conversation
…on gap on Raft followers On Raft HA leaders, appendSamples() held compactionLock.readLock() through db.commit(). db.commit() calls waitForActiveRecordingSession(), which waits for the compaction recording session to end. The session ends only after Phase 4c (which needs the write lock), but Phase 4c cannot acquire the write lock while the read lock is held — a deadlock broken only by the HA_QUORUM_TIMEOUT. When that timeout fired, TX_ENTRY shipped before the buffered SCHEMA_ENTRY, causing WALVersionGapException on the follower and forcing a snapshot resync. Fix: on replicated databases, release compactionLock.readLock() before db.commit() so Phase 4c can always acquire the write lock without deadlock. If Phase 4c commits a page-0 clear between the lock release and commit, the append gets a ConcurrentModificationException and retries transparently on the freshly-cleared page. Phase 0 also gains a CME retry loop to handle in-flight append commits. Standalone databases keep the original locking (readLock through commit) since they have no recording-session deadlock. Tests: - Issue4458AppendCompactionRaceTest: embedded CME-retry correctness - Issue4458TsWalVersionGapIT: 2-node HA deterministic race via TEST_PRE_PHASE4C_HOOK, verifies zero WAL version gaps on the follower
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 17 |
🟢 Coverage 67.61% diff coverage · -7.53% coverage variation
Metric Results Coverage variation ✅ -7.53% coverage variation Diff coverage ✅ 67.61% diff coverage Coverage variation details
Coverable lines Covered lines Coverage Common ancestor commit (8bd740d) 124323 92571 74.46% Head commit (a31020d) 160094 (+35771) 107157 (+14586) 66.93% (-7.53%) Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch:
<coverage of head commit> - <coverage of common ancestor commit>Diff coverage details
Coverable lines Covered lines Diff coverage Pull request (#4460) 71 48 67.61% Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified:
<covered lines added or modified>/<coverable lines added or modified> * 100%
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request addresses issue #4458 by resolving a deadlock between concurrent appends and compaction Phase 4c in Raft HA mode. The fix releases the compaction read lock before committing on Raft HA leaders and introduces transparent retries for ConcurrentModificationException when compaction races with appends. Additionally, tests and test-only hooks are added to verify the fix. The reviewer noted that the test-only hook TEST_PRE_REPLICATE_SCHEMA_HOOK is defined and cleaned up but never actually utilized in the tests, suggesting its removal if it is not required.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| /** | ||
| * Test-only hook. Fires in {@link #runWithCompactionReplication} just before {@code replicateSchema()} | ||
| * is called. Used to inject delays and verify that concurrent appends correctly wait for the | ||
| * recording session to end rather than timing out and shipping TX_ENTRY before the SCHEMA_ENTRY. | ||
| */ | ||
| static volatile Consumer<String> TEST_PRE_REPLICATE_SCHEMA_HOOK = null; |
There was a problem hiding this comment.
The test-only hook TEST_PRE_REPLICATE_SCHEMA_HOOK is defined and cleaned up in the tests, but it is never actually set or utilized within Issue4458TsWalVersionGapIT or any other test in this pull request. If it is not required for deterministic testing, consider removing it to keep the codebase clean.
Code Review - Fix #4458: TS compaction/append deadlock causing WAL version gap on Raft followersThe root cause analysis is clear and the fix is well-reasoned. A few observations below. CorrectnessThe deadlock analysis and fix are sound:
Import ordering
// Current (mixed)
import java.util.logging.Level;
import java.util.ArrayList;
import java.util.Arrays;
import com.arcadedb.exception.ConcurrentModificationException; // <-- out of order
import java.util.HashSet;Recursive retry vs. loop
private void appendSamplesAttempt(final long[] timestamps, final Object[]... columnValues)
throws IOException {
for (int attempt = 3; attempt > 0; attempt--) {
compactionLock.readLock().lock();
boolean readLockHeld = true;
try {
// ...same body...
} catch (final ConcurrentModificationException e) {
if (db.isTransactionActive()) db.rollback();
if (attempt == 1)
throw new IOException("Failed to append timeseries samples after compaction-race retries", e);
// else continue loop
} finally {
if (readLockHeld) compactionLock.readLock().unlock();
}
}
}Embedded test does not exercise the CME-retry path
The test is still a valid regression test for concurrent appends + compaction under the embedded engine, but the Javadoc says it "Verifies CME-retry preserves all data when Phase 4c races concurrent appends" - that claim only holds in HA mode. The HA integration test ( Minor:
|
…rify embedded test scope - Remove unused TEST_PRE_REPLICATE_SCHEMA_HOOK from RaftReplicatedDatabase (gemini) - Move com.arcadedb.exception.ConcurrentModificationException import out of java.util block (claude) - Convert recursive appendSamplesAttempt to an iterative retry loop (claude) - Clarify Issue4458AppendCompactionRaceTest javadoc: standalone path does not exercise the HA-only CME-retry; HA path is covered by Issue4458TsWalVersionGapIT (claude) - Hoist appendsBusy.countDown() out of the inner insert loop (claude)
Code Review - PR #4460: TS compaction/append deadlock fixOverviewThis PR fixes a deadlock triangle in Raft HA mode that caused WAL page-version gaps on followers, ultimately forcing costly snapshot resyncs. The root cause is well-diagnosed: Issues and SuggestionsMissing file from diff The PR description lists Lock ordering change The previous ordering was Phase 0 retry: fragile intermediate variable int phase0Retrieved = -1;
for (int attempt = 0; attempt < 3; attempt++) {
...
phase0Retrieved = pageCount;
break;
...
}
snapshotDataPageCount = phase0Retrieved;Control flow guarantees Inconsistent retry conventions Phase 0 counts up ( No logging on CME retry When LogManager.instance().log(this, Level.FINE,
"CME on TS append (attempt %d/3) - retrying after Phase 4c race", 4 - attempt);Test hooks in production code
Standalone test does not exercise the fix
Docs file belongs in commit history, not source
Positive Notes
SummaryThe core fix is correct and the HA integration test covers the critical scenario well. The main actionable items are: (1) clarify or remove the |
…reset docs - Add FINE-level log on each CME retry in appendSamples for production diagnosability - Standardize Phase 0 retry loop to count-down convention, matching appendSamples - Add assert guard after Phase 0 loop to make the phase0Retrieved invariant explicit - Document on both TEST_* hook fields that tests must reset them in @AfterEach
Code Review - PR #4458 Fix: TS Compaction/Append DeadlockThe root-cause analysis and fix are sound. The deadlock chain (append holds readLock → waitForActiveRecordingSession spins → Phase 4c needs writeLock → blocked by readLock) is clearly explained and the solution is well-structured. Potential Issues1. Documentation mentions
No such change appears in the diff, and the HA test ( 2. assert phase0Retrieved >= 0 : "Phase 0 exited the retry loop without a valid page count";
snapshotDataPageCount = phase0Retrieved;Java assertions are disabled by default (no if (phase0Retrieved < 0)
throw new IllegalStateException("Phase 0 exited retry loop without a valid page count");3. Unused The Minor Points4. 5. Double-space alignment in the HA test: final List<Throwable> errors = new ArrayList<>();
final ExecutorService exec = Executors.newFixedThreadPool(appendThreads + 1);The existing codebase doesn't use this alignment style (single space after the type). Minor but inconsistent with surrounding code. 6. What's Done Well
|
- Replace Phase 0 assert with an explicit IllegalStateException guard (assertions are disabled by default in production, so the assert would not protect the invariant) - Rename phase0Retrieved -> capturedPageCount for clarity - Document that waitForActiveRecordingSession's HA_QUORUM_TIMEOUT is an intentional safety net retained after the #4458 source fix, not dead code - Correct docs/4458 to drop the stale RaftReplicatedDatabase/TEST_PRE_REPLICATE_SCHEMA_HOOK ref - Remove unused 'committed' counter and AtomicInteger import from the embedded test - Normalize double-space alignment to single-space in the HA test
Code Review: fix(#4458) - TS compaction/append deadlock causing WAL version gap on Raft followersThis is a well-reasoned fix for a real production bug. The root-cause analysis is correct and the solution is sound. Notes below, roughly in order of importance. Correctness / Thread Safety
Phase 4c hook read into a local - good. final Runnable prePhase4cHook = TEST_PRE_PHASE4C_HOOK;
if (prePhase4cHook != null)
prePhase4cHook.run();Reading the volatile into a local before the null-check eliminates the TOCTOU race where a test could null the field between the check and the call. Correct pattern.
Potential Issues
(In standalone mode the CME-retry path is not taken anyway, so this does not affect correctness of the test - but it does affect whether the race is reliably triggered.)
Minor / StyleComment verbosity vs. project conventions. Copyright header inconsistency.
Log level for CME retries. Test CoverageBoth scenarios are covered:
The SummaryThe fix is correct and the approach (release read lock early on replicated databases, retry on CME) is the right solution given the constraint that |
…n embedded test Count the appendsInFlight latch down from INSIDE the first append transaction (after the write, before commit) instead of right after the thread starts. This guarantees the append threads have an open transaction in flight when Phase 4c proceeds, making the append/compaction overlap reliable regardless of CI runner scheduling latency.
Code Review - fix(#4458): TS compaction/append deadlock causing WAL version gap on Raft followersThe root cause analysis is precise and the fix is surgically targeted. Overall this is solid work. A few observations below. CorrectnessLock ordering reversal (important to verify) Before this fix, the acquisition order in
After the fix it is:
Lock-ordering inversions between two threads are the classic deadlock recipe. Since // Lock ordering: appendLock MUST be acquired before compactionLock on any code path
// that takes both. Violating this order will cause a deadlock.A quick grep for any other site that acquires Retry-attempt log message arithmetic LogManager.instance().log(this, Level.FINE,
"CME on TimeSeries append for shard %d (attempt %d/3) - retrying after compaction Phase 4c race",
shardIndex, 4 - attempt);When Phase 0 Using a if (capturedPageCount < 0)
throw new IllegalStateException("Phase 0 exited the retry loop without a valid page count");...works, but it is slightly fragile - any future refactor that adds a Testing
The latch counts down from inside the transaction lambda (before the implicit commit). In standalone mode the read lock IS held at that point, which is exactly the intended setup - the hook can confirm all readers hold their locks before sleeping. This is clever and correctly documented. One edge case: if a thread fails on its very first insert (e.g., schema error), HA test data-integrity check
assertThat(followerCount)
.as("follower converges to leader count").isEqualTo(leaderCount);
assertThat(leaderCount)
.as("no samples lost on leader")
.isGreaterThanOrEqualTo(250 + appendThreads * appendsPerThread);Minor style notes
SummaryThe fix is correct and the tests are well-designed to exercise the race deterministically. The main item worth addressing before merge is adding explicit lock-ordering documentation to prevent a future regression; the others are minor polish suggestions. |
…a-integrity assert - Document the canonical lock-ordering invariant (appendLock before compactionLock) at the field declarations to prevent a future lock-inversion regression - Name the ascending retry counter in the CME log message instead of inline 4 - attempt - HA test: assert the leader retains the full expected sample count (no leader-side loss), not merely that the follower agrees with the leader
Code Review - fix(#4458): TS compaction/append deadlock causing WAL version gap on Raft followersSummary: Well-documented fix for a real concurrency bug. The root cause analysis is accurate, the fix is targeted, and the test coverage is solid. A few observations below. Correctness - Looks goodThe deadlock chain is correctly identified and the fix breaks it at the right point:
Potential issues1. The embedded test does not exercise the new code path
This means the embedded test is a regression guard for the old standalone path, not the new fix. That's worth a comment in the test name or a method-level note so future maintainers don't assume it covers the HA path. The HA IT is the real regression guard here. 2. compactionLock.readLock().lock();
boolean readLockHeld = true;
try {
...
if (db.isReplicated()) {
compactionLock.readLock().unlock();
readLockHeld = false;
}
try {
db.commit();
return;
} catch (final ConcurrentModificationException e) { ... }
} finally {
if (readLockHeld)
compactionLock.readLock().unlock();
}If someone adds code between the 3. Magic retry count Both private static final int COMPACTION_RACE_RETRY_LIMIT = 3;4. Test hooks in production code
Minor observations
Test coverage assessment
The Phase 0 CME-retry path (the new loop in OverallSolid fix for a subtle HA concurrency bug. The lock ordering change and the replicated-vs-standalone branching are correct. The main actionable suggestions are: (1) clarify in the embedded test that it does not cover the new HA code path, (2) extract the retry limit as a named constant. No blocking issues. |
|
Code Review - TimeSeriesShard deadlock fix The fix correctly targets the deadlock. Root cause is accurate: read lock held across commit() which calls waitForActiveRecordingSession(), creating a cycle with Phase 4c write lock. CORRECTNESS - no blocking issues:
OBSERVATIONS:
MINOR NITS:
TEST COVERAGE: SUMMARY: No blocking issues. The core fix is correct. Main suggestions: (1) latch-based synchronization in the HA test to avoid vacuous pass on slow CI, and (2) a comment on the Phase 0 retry path being defensive and untested-by-design. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4460 +/- ##
===========================================
Coverage 65.27% 65.28%
- Complexity 0 594 +594
===========================================
Files 1612 1649 +37
Lines 124323 128388 +4065
Branches 26912 27540 +628
===========================================
+ Hits 81150 83812 +2662
- Misses 31780 32743 +963
- Partials 11393 11833 +440 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closes #4458
Summary
On Raft HA leaders,
appendSamples()heldcompactionLock.readLock()throughdb.commit().db.commit()callswaitForActiveRecordingSession(), which spins until the compaction recording session ends. The session ends only after Phase 4c (which needscompactionLock.writeLock()), but Phase 4c cannot acquire the write lock while any thread holds the read lock — a deadlock broken only byHA_QUORUM_TIMEOUT. When that timeout fired,TX_ENTRYshipped before the bufferedSCHEMA_ENTRY, causingWALVersionGapExceptionon the follower and forcing a snapshot resync.Fix: on replicated databases, release
compactionLock.readLock()beforedb.commit()so Phase 4c can always acquire the write lock without deadlock. If Phase 4c commits a page-0 clear between the lock release and the commit, the append gets aConcurrentModificationExceptionand retries transparently on the freshly-cleared page (at most 3 retries). Phase 0 also gains a CME retry loop to handle in-flight append commits. Standalone databases keep the original locking (read lock through commit) since they have no recording-session deadlock.Files changed:
engine/.../TimeSeriesShard.java: serialize viaappendLockfirst; on replicated DBs releasecompactionLock.readLock()before commit with a CME retry loop; add a Phase 0 CME retry loop; addTEST_PRE_PHASE4C_HOOKfor deterministic testingha-raft/.../ArcadeStateMachine.java: addTEST_WAL_GAP_COUNTERfor deterministic testingTest plan
Issue4458AppendCompactionRaceTest(engine, embedded) - regression guard for concurrent appends + compaction on the standalone path (Javadoc notes the HA-only CME-retry path is exercised by the HA test below)Issue4458TsWalVersionGapIT(ha-raft) - 2-node Raft HA, usesTEST_PRE_PHASE4C_HOOKto race appends against Phase 4c deterministically, verifiesTEST_WAL_GAP_COUNTER == 0on the followerTimeSeries*engine suite (173 tests) - no regressionsTimeSeriesConcurrentInsertTest(48-thread concurrent insert + auto-compaction) - passesTimeSeriesGrpcHaConcurrentInsertIT(existing 3-thread gRPC HA smoke test) - passes