fix(utxo): roll back ProcessConflicting on step-3+ failure#765
Conversation
|
🤖 Claude Code Review Status: Complete Current Issues: The test mocks do not match the actual behavior of MarkConflictingRecursively, which builds allMarkedHashes by accumulating both the initial losing tx hashes AND their descendants from the BFS traversal. Most tests return empty slices for the second return value of SetConflicting, causing allMarkedHashes to be empty. This prevents proper verification that the rollback correctly clears conflicting flags on all recursively-marked transactions. See inline comments on:
Note: The cascade descendants test (TestProcessConflictingRollback_CascadeDescendants) correctly models the BFS behavior and properly verifies rollback with descendants. Implementation: The rollback logic itself appears sound. The deferred compensating rollback correctly reverses committed phases, the step-5 retry strategy is justified, and partial step-3 spend tracking enables proper cleanup. |
| // step 1: MarkConflictingRecursively → SetConflicting(losing, true) | ||
| affectedSpends := []*Spend{{TxID: &losingTxHash, Vout: 0}} | ||
| mockStore.On("SetConflicting", mock.Anything, []chainhash.Hash{losingTxHash}, true). | ||
| Return(affectedSpends, []chainhash.Hash{}, nil).Once() |
There was a problem hiding this comment.
[Critical] Bug: Test mocks don't match actual MarkConflictingRecursively behavior
The test mocks return empty slices for allMarkedHashes (second return value) from SetConflicting, but in the real implementation, MarkConflictingRecursively builds allMarkedHashes by appending to markedOrder in BFS traversal order.
Impact: The rollback logic at line 299-303 attempts to clear the conflicting flag on allMarkedHashes, but in these tests allMarkedHashes will be empty. This means:
- The tests don't verify that rollback step 4 actually clears all recursively-marked hashes
- If a losing tx has children that were marked conflicting by
MarkConflictingRecursively, the rollback won't clear their flags (in the test scenario)
Fix: Update test mocks to return the same hash in the second return value:
mockStore.On("SetConflicting", mock.Anything, []chainhash.Hash{losingTxHash}, true).
Return(affectedSpends, []chainhash.Hash{losingTxHash}, nil).Once()This ensures allMarkedHashes contains at least losingTxHash, matching real-world behavior where the input hashes are always included in markedOrder.
There was a problem hiding this comment.
The bot has misread MarkConflictingRecursively. The 2nd return of SetConflicting is the children to enqueue for the next BFS iteration, not the cumulative markedOrder. markedOrder is initialized from the input hashes at process_conflicting.go:386-392 independently of the children return, then appended to per BFS level inside MarkConflictingRecursively.
Returning []chainhash.Hash{} (empty children) correctly terminates the BFS after one iteration with markedOrder = [losingTxHash] — exactly what these tests need. They're exercising the linear case (cascade depth 1), where step-2/3/5 failure paths are the subject and cascade depth is incidental.
The bot's suggested fix — returning []chainhash.Hash{losingTxHash} as the second value — would feed the current iteration's hash back as a next-iteration child, which the BFS guards against with the visited set but at minimum misrepresents the contract (and on a less-guarded variant could loop).
The dedicated BFS-path test is TestProcessConflictingRollback_CascadeDescendants in the same file — that one does return a non-empty second value from SetConflicting to simulate a real cascade. The linear-case tests here intentionally don't, and they're correct as-is.
| Return([]chainhash.Hash{losingTxHash}, nil).Once() | ||
|
|
||
| affectedSpends := []*Spend{{TxID: &losingTxHash, Vout: 0}} | ||
| mockStore.On("SetConflicting", mock.Anything, []chainhash.Hash{losingTxHash}, true). |
There was a problem hiding this comment.
[Critical] Same issue: Mock returns empty allMarkedHashes
Same test mock issue as the previous test — SetConflicting returns an empty slice for allMarkedHashes instead of []chainhash.Hash{losingTxHash}. This prevents proper verification of rollback step 4.
There was a problem hiding this comment.
Same misreading as the bot's first comment at process_conflicting_rollback_test.go:45 — the empty []chainhash.Hash{} is intentional and correctly terminates BFS for the linear-case test. See the reply on comment #3154997257 for the full data-flow explanation. The dedicated TestProcessConflictingRollback_CascadeDescendants is the BFS-fidelity test. No change needed here.
| Return([]chainhash.Hash{losingTxHash}, nil).Once() | ||
|
|
||
| affectedSpends := []*Spend{{TxID: &losingTxHash, Vout: 0}} | ||
| mockStore.On("SetConflicting", mock.Anything, []chainhash.Hash{losingTxHash}, true). |
There was a problem hiding this comment.
[Critical] Same issue in retry test
This test also has the empty allMarkedHashes mock problem. Should return []chainhash.Hash{losingTxHash} as the second value.
There was a problem hiding this comment.
Same misreading as the bot's first comment at process_conflicting_rollback_test.go:45 — the empty []chainhash.Hash{} is intentional and correctly terminates BFS for the linear-case test. See the reply on comment #3154997257 for the full data-flow explanation. The dedicated TestProcessConflictingRollback_CascadeDescendants is the BFS-fidelity test. No change needed here.
| Return([]chainhash.Hash{losingTxHash}, nil).Once() | ||
|
|
||
| affectedSpends := []*Spend{{TxID: &losingTxHash, Vout: 0}} | ||
| mockStore.On("SetConflicting", mock.Anything, []chainhash.Hash{losingTxHash}, true). |
There was a problem hiding this comment.
[Critical] Same issue in exhausted retry test
Same mock issue here — should return []chainhash.Hash{losingTxHash} as the second value from SetConflicting.
There was a problem hiding this comment.
Same misreading as the bot's first comment at process_conflicting_rollback_test.go:45 — the empty []chainhash.Hash{} is intentional and correctly terminates BFS for the linear-case test. See the reply on comment #3154997257 for the full data-flow explanation. The dedicated TestProcessConflictingRollback_CascadeDescendants is the BFS-fidelity test. No change needed here.
| Return(errors.NewProcessingError("unspend failed")) | ||
|
|
||
| // step 2 failed → rollback only undoes step 1 (clear conflicting flag). | ||
| mockStore.On("SetConflicting", mock.Anything, []chainhash.Hash{losingTxHash}, false). |
There was a problem hiding this comment.
[Critical] Mock issue in existing test
The SetConflicting mock at line ~176 should return []chainhash.Hash{losingTxHash} as the second value (allMarkedHashes), not an empty slice. This ensures the rollback step 4 (clearing conflicting flags) is properly tested.
There was a problem hiding this comment.
Same misreading as the bot's first comment at process_conflicting_rollback_test.go:45 — the empty []chainhash.Hash{} is intentional and correctly terminates BFS for the linear-case test. See the reply on comment #3154997257 for the full data-flow explanation. The dedicated TestProcessConflictingRollback_CascadeDescendants is the BFS-fidelity test. No change needed here.
| }, nil).Once() | ||
| mockStore.On("Spend", mock.Anything, losingTx, mock.Anything, mock.Anything). | ||
| Return([]*Spend{}, nil).Once() | ||
| mockStore.On("SetConflicting", mock.Anything, []chainhash.Hash{losingTxHash}, false). |
There was a problem hiding this comment.
[Critical] Mock issue in SpendError test
The SetConflicting mock at line ~219 should return []chainhash.Hash{losingTxHash} as the second value (allMarkedHashes), not an empty slice.
There was a problem hiding this comment.
Same misreading as the bot's first comment at process_conflicting_rollback_test.go:45 — the empty []chainhash.Hash{} is intentional and correctly terminates BFS for the linear-case test. See the reply on comment #3154997257 for the full data-flow explanation. The dedicated TestProcessConflictingRollback_CascadeDescendants is the BFS-fidelity test. No change needed here.
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-12 14:26 UTC |
54d603b to
8a13ebd
Compare
|
oskarszoon
left a comment
There was a problem hiding this comment.
Approve. Deferred compensating-rollback is correctly shaped and ordered (rollback step 1 re-marks winners conflicting first to avoid an observable inconsistent intermediate state). Step-5 retry-without-rollback decision is sound — rolling back from "winner accepted, parents still locked" to "everything-conflicting" would be strictly worse than retrying SetLocked. Per-spend partial-failure tracking is race-safe + nil-safe. Sonar PASSED 89.9% new-code coverage, benchmark clean.
One ask before prod rollout: please wire alerting on MANUAL INTERVENTION REQUIRED — Prometheus counter + Coralogix/Loki log-pattern rule. Today the tag is grep-able (only 2 hits in process_conflicting.go) but no alert fires, so a real rollback-also-failed event would be silent in prod.
Two small residual exposures worth a follow-up but not gating: (1) rollback step 2 silently errors.Join+continues if a descendant's tx body is unfetchable, leaving a parent UTXO unspent + unlocked + unconflicting — re-locking on rollback failure would be safer; (2) SetConflicting(allMarkedHashes, false) at rollback step 4 could TOCTOU-un-mark a tx a concurrent reorg path independently flagged Conflicting — worth documenting the exclusive-access invariant.
Replied separately to the claude[bot] inline comments — their "[Critical] mocks don't match MarkConflictingRecursively" critique is incorrect; the bot misread the BFS data flow. The second return of SetConflicting is the next-iteration children, not the cumulative markedOrder, so the empty children correctly terminate BFS for the linear-case tests. TestProcessConflictingRollback_CascadeDescendants is the dedicated BFS-fidelity test.



Closes #4561.
Summary
ProcessConflictingruns a 5-phase commit to resolve double-spend conflicts during a reorg. Steps 1-2 commit (mark losing txs conflicting + unspend parents + lock them) before step 3 (Spendof winning tx). If step 3 or later fails, no rollback runs — parents stay locked permanently, losing txs stay conflicting permanently, and future reorgs cannot re-apply the transactions. The audit (#4561) classified this as CRITICAL because the failure mode is stuck reorg / permanent UTXO lockout.Fix
Adds a deferred compensating rollback that fires on any step-2+ failure when at least step 1 committed:
Unspendthe spends whose.Err == nil).Spend(losingTx, IgnoreConflicting, IgnoreLocked); the losing tx body is fetched viaGetin the rollback path).SetConflicting(allMarkedHashes, false)to clear the recursive conflict markers from step 1.SetLocked(parents, false)to undo step 2's lock.Step 5 (
SetLocked(false)) failures are retried with a bounded back-off (3 attempts: 0ms / 50ms / 200ms) before returning. By the time we reach step 5, steps 1-4 are correct; rolling back would re-introduce the very inconsistencies just fixed and create a strictly worse state. The retry path explicitly skips rollback even on exhausted retry — only parents are still locked, which is recoverable manually.If rollback itself fails, the function returns a wrapped error tagged
MANUAL INTERVENTION REQUIREDcontaining both the original failure and the rollback failure.Test plan
TestProcessConflicting_UnspendErrorandTestProcessConflicting_SpendErrorupdated to expect the new rollback calls.go test -race ./stores/utxo/...and-tags aerospikevariants pass.