Skip to content

Lock-free parallel execution: derive per-block changesets post-hoc, remove changeset accumulator from the exec path #21106

@mh0lt

Description

@mh0lt

Background

PR #21088 added a changesetMu mutex on SharedDomains (and FlushPendingUpdatesLocked / ComputeCommitmentLocked variants) as a band-aid to serialize the parallel commitment calculator's swap-and-record window against the apply goroutine's DomainPut/DomainDel. That closed the off-by-one wrong-trie-root cluster, TestRecreateAndRewind, and all ~227 race-detector hits across the EXEC3_PARALLEL=true race-test matrix groups — but at the cost of serializing apply-side writes during compute.

The problem

The "current changeset accumulator" is unwind-side machinery: a sidecar that records per-block prev-value diffs so a later unwind can reconstruct the pre-block state. Execution should be forward-only and not be concerned with it. Today the parallel calculator swaps a global accumulator pointer to route block N's branch writes into block N's saved CS, and the apply loop writes through that same pointer — hence the need for the band-aid lock.

Proposed direction

Derive per-block changesets post-hoc from sd entries (now tx-granular) at sd.Flush time, instead of maintaining the accumulator during execution. Then:

  • delete changesetMu and the Lock/UnlockChangesetAccumulator + *Locked API surface
  • delete the swap dance in committer.go computeWithBlockAccumulator
  • delete the SetChangesetAccumulator / GetChangesetAccumulator / SavePastChangesetAccumulator API
  • delete the domain == kv.CommitmentDomain exemptions in SharedDomains.DomainPut / DomainDel

Smallest first step (Option A0)

SharedDomainsCommitmentContext.deferCommitmentUpdates already exists and is enabled for parallel-applying-blocks (exec3.go:217). Branches accumulate in pendingUpdate and FlushPendingUpdates replays them. The remaining inline-write paths to chase are encodeAndStoreCommitmentState's [state] marker write and the concurrentTrieContextFactory ETL drain — route those through the deferred mechanism too, then the lock window collapses to a single single-threaded flush.

Review note

@AskAlexSharov suggested folding changesetMu into the existing latestStateLock in #21088 review — kept separate to avoid widening the high-traffic latestStateLock (held on every Put/Del/GetLatest/GetAsOf) to cover the calculator's whole ComputeCommitment window. The lock-free refactor here makes that question moot.

Acceptance

  • changesetMu and the associated swap API removed
  • all four parallel race-test matrix groups still report 0 race-detector hits
  • all Bucket C tests (TestBlockchainHeaderchainReorgConsistency, TestLongerForkHeaders/Blocks, TestCallTraceUnwind, TestTxLookupUnwind, TestLowDiffLongChain, TestRecreateAndRewind) still pass under EXEC3_PARALLEL=true
  • benchmark: apply-side throughput during compute no longer serialized

Related: #21088, #21017

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions