Summary
Found while auditing for "mutate-then-check" patterns after #851 landed. The rollback path in moveForwardBlock (and the analogous path in reorgBlocks) restores four in-memory fields on error but does not restore the queue (batches drained by processRemainderTransactionsAndDequeue) or external UTXO-store writes. Same bug class as #851 at a larger scale.
Status: based on code review of services/blockassembly/subtreeprocessor/SubtreeProcessor.go against current main. Not yet reproduced by a failing test.
Locations
Two rollback sites with identical shape:
SubtreeProcessor.go:704-722 - main event loop, moveForwardBlock chan handler.
SubtreeProcessor.go:2624-2649 - reorgBlocks, moveForward-only fast path.
Both snapshot:
originalChainedSubtrees := stp.chainedSubtrees
originalCurrentSubtree := stp.currentSubtree.Load()
originalCurrentTxMap := stp.currentTxMap
currentBlockHeader := stp.currentBlockHeader.Load()
And on error from moveForwardBlock(...):
stp.chainedSubtrees = originalChainedSubtrees
stp.currentSubtree.Store(originalCurrentSubtree)
stp.currentTxMap = originalCurrentTxMap
stp.currentBlockHeader.Store(currentBlockHeader)
stp.setTxCountFromSubtrees()
What moveForwardBlock mutates beyond those four fields
Walking moveForwardBlock (SubtreeProcessor.go:3564-3667):
- Read-only setup.
processConflictingTransactions (line 3614) - may modify UTXO store via ProcessConflicting.
resetSubtreeState (line 3623) - modifies the snapshotted fields.
processRemainderTransactionsAndDequeue (line 3628) - drains the queue via dequeueDuringBlockMovement. Drained batches are added to the new subtree state.
processCoinbaseUtxos (line 3644) - writes coinbase UTXOs to the UTXO store.
If steps 2, 4, or 5 error after partial work, the rollback misses those changes.
Failure modes
- Step 2 fails mid-conflict-marking: UTXO store left with partial Conflicting marks. Probably idempotent so not catastrophic.
- Step 4 fails mid-drain: queue partially drained. Batches dequeued into the new subtree state are discarded by rollback. Real tx loss.
- Step 5 fails after coinbase insert: orphan coinbase in UTXO store, plus all of step 4's drained txs lost.
Reproduction sketch
Deterministic via the clock seam (#841) and a utxoStore wrapper:
// 1. Build SubtreeProcessor with small InitialMerkleItemsPerSubtree.
// 2. Pin queue and SubtreeProcessor clocks via fixedClock.
// 3. Enqueue N batches so the queue has work to drain.
// 4. Construct a block whose CoinbaseTx triggers an error inside a test
// utxoStore wrapper (returns sentinel error on Create).
// 5. queueLenBefore := stp.queue.length()
// 6. err := stp.MoveForwardBlock(block)
// 7. Assert err != nil
// 8. Assert stp.queue.length() < queueLenBefore (drains happened)
// 9. Assert chainedSubtrees / currentSubtree / currentTxMap match pre-call snapshot
// 10. The drained batches are nowhere to be found.
Suggested remediation options (in increasing scope)
- Reorder side effects: defer the drain until after
processCoinbaseUtxos. Smallest change but needs review of downstream ordering assumptions.
- Queue snapshot/restore primitive: capture head/tail/length before the drain, restore on error. Complicated by concurrent enqueues.
- Two-phase dequeue: read batches without removing them, commit on success. Largest restructure.
Option 1 is the most tractable as a single PR.
Related
Summary
Found while auditing for "mutate-then-check" patterns after #851 landed. The rollback path in
moveForwardBlock(and the analogous path inreorgBlocks) restores four in-memory fields on error but does not restore the queue (batches drained byprocessRemainderTransactionsAndDequeue) or external UTXO-store writes. Same bug class as #851 at a larger scale.Status: based on code review of
services/blockassembly/subtreeprocessor/SubtreeProcessor.goagainst currentmain. Not yet reproduced by a failing test.Locations
Two rollback sites with identical shape:
SubtreeProcessor.go:704-722- main event loop,moveForwardBlockchan handler.SubtreeProcessor.go:2624-2649-reorgBlocks, moveForward-only fast path.Both snapshot:
And on error from
moveForwardBlock(...):What moveForwardBlock mutates beyond those four fields
Walking
moveForwardBlock(SubtreeProcessor.go:3564-3667):processConflictingTransactions(line 3614) - may modify UTXO store viaProcessConflicting.resetSubtreeState(line 3623) - modifies the snapshotted fields.processRemainderTransactionsAndDequeue(line 3628) - drains the queue viadequeueDuringBlockMovement. Drained batches are added to the new subtree state.processCoinbaseUtxos(line 3644) - writes coinbase UTXOs to the UTXO store.If steps 2, 4, or 5 error after partial work, the rollback misses those changes.
Failure modes
Reproduction sketch
Deterministic via the clock seam (#841) and a
utxoStorewrapper:Suggested remediation options (in increasing scope)
processCoinbaseUtxos. Smallest change but needs review of downstream ordering assumptions.Option 1 is the most tractable as a single PR.
Related
validFromMilliszero-guard fix.validFromMillisadmit logic.