Background
PR #21088 added a changesetMu mutex on SharedDomains (and FlushPendingUpdatesLocked / ComputeCommitmentLocked variants) as a band-aid to serialize the parallel commitment calculator's swap-and-record window against the apply goroutine's DomainPut/DomainDel. That closed the off-by-one wrong-trie-root cluster, TestRecreateAndRewind, and all ~227 race-detector hits across the EXEC3_PARALLEL=true race-test matrix groups — but at the cost of serializing apply-side writes during compute.
The problem
The "current changeset accumulator" is unwind-side machinery: a sidecar that records per-block prev-value diffs so a later unwind can reconstruct the pre-block state. Execution should be forward-only and not be concerned with it. Today the parallel calculator swaps a global accumulator pointer to route block N's branch writes into block N's saved CS, and the apply loop writes through that same pointer — hence the need for the band-aid lock.
Proposed direction
Derive per-block changesets post-hoc from sd entries (now tx-granular) at sd.Flush time, instead of maintaining the accumulator during execution. Then:
- delete
changesetMu and the Lock/UnlockChangesetAccumulator + *Locked API surface
- delete the swap dance in
committer.go computeWithBlockAccumulator
- delete the
SetChangesetAccumulator / GetChangesetAccumulator / SavePastChangesetAccumulator API
- delete the
domain == kv.CommitmentDomain exemptions in SharedDomains.DomainPut / DomainDel
Smallest first step (Option A0)
SharedDomainsCommitmentContext.deferCommitmentUpdates already exists and is enabled for parallel-applying-blocks (exec3.go:217). Branches accumulate in pendingUpdate and FlushPendingUpdates replays them. The remaining inline-write paths to chase are encodeAndStoreCommitmentState's [state] marker write and the concurrentTrieContextFactory ETL drain — route those through the deferred mechanism too, then the lock window collapses to a single single-threaded flush.
Review note
@AskAlexSharov suggested folding changesetMu into the existing latestStateLock in #21088 review — kept separate to avoid widening the high-traffic latestStateLock (held on every Put/Del/GetLatest/GetAsOf) to cover the calculator's whole ComputeCommitment window. The lock-free refactor here makes that question moot.
Acceptance
changesetMu and the associated swap API removed
- all four parallel race-test matrix groups still report 0 race-detector hits
- all Bucket C tests (
TestBlockchainHeaderchainReorgConsistency, TestLongerForkHeaders/Blocks, TestCallTraceUnwind, TestTxLookupUnwind, TestLowDiffLongChain, TestRecreateAndRewind) still pass under EXEC3_PARALLEL=true
- benchmark: apply-side throughput during compute no longer serialized
Related: #21088, #21017
Background
PR #21088 added a
changesetMumutex onSharedDomains(andFlushPendingUpdatesLocked/ComputeCommitmentLockedvariants) as a band-aid to serialize the parallel commitment calculator's swap-and-record window against the apply goroutine'sDomainPut/DomainDel. That closed the off-by-one wrong-trie-root cluster,TestRecreateAndRewind, and all ~227 race-detector hits across theEXEC3_PARALLEL=truerace-test matrix groups — but at the cost of serializing apply-side writes during compute.The problem
The "current changeset accumulator" is unwind-side machinery: a sidecar that records per-block prev-value diffs so a later unwind can reconstruct the pre-block state. Execution should be forward-only and not be concerned with it. Today the parallel calculator swaps a global accumulator pointer to route block N's branch writes into block N's saved CS, and the apply loop writes through that same pointer — hence the need for the band-aid lock.
Proposed direction
Derive per-block changesets post-hoc from sd entries (now tx-granular) at
sd.Flushtime, instead of maintaining the accumulator during execution. Then:changesetMuand theLock/UnlockChangesetAccumulator+*LockedAPI surfacecommitter.go computeWithBlockAccumulatorSetChangesetAccumulator/GetChangesetAccumulator/SavePastChangesetAccumulatorAPIdomain == kv.CommitmentDomainexemptions inSharedDomains.DomainPut/DomainDelSmallest first step (Option A0)
SharedDomainsCommitmentContext.deferCommitmentUpdatesalready exists and is enabled for parallel-applying-blocks (exec3.go:217). Branches accumulate inpendingUpdateandFlushPendingUpdatesreplays them. The remaining inline-write paths to chase areencodeAndStoreCommitmentState's[state]marker write and theconcurrentTrieContextFactoryETL drain — route those through the deferred mechanism too, then the lock window collapses to a single single-threaded flush.Review note
@AskAlexSharov suggested folding
changesetMuinto the existinglatestStateLockin #21088 review — kept separate to avoid widening the high-trafficlatestStateLock(held on everyPut/Del/GetLatest/GetAsOf) to cover the calculator's wholeComputeCommitmentwindow. The lock-free refactor here makes that question moot.Acceptance
changesetMuand the associated swap API removedTestBlockchainHeaderchainReorgConsistency,TestLongerForkHeaders/Blocks,TestCallTraceUnwind,TestTxLookupUnwind,TestLowDiffLongChain,TestRecreateAndRewind) still pass underEXEC3_PARALLEL=trueRelated: #21088, #21017