fix: wait for BlockValidation to process invalid moveBack blocks before loading unmined txs#691
Conversation
…during reset After InvalidateBlock, reset() skipped invalid moveBack blocks assuming BlockValidation's async setTxMinedStatus(unsetMined=true) would handle them. But that runs asynchronously via a BlockMinedUnset notification and may not have completed when reset() calls loadUnminedTransactions(). The result: transactions from invalidated blocks retain unmined_since=NULL (mined state) in the UTXO store, so loadUnminedTransactions() misses them and they are silently lost from block assembly. Fix: remove the invalid block skip — MarkTransactionsOnLongestChain is idempotent, so double-marking is safe and ensures txs are recoverable. Fixes TestLongestChainPostgres/invalid_block which regressed in PR bsv-blockchain#679. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🤖 Claude Code Review Status: Complete Issues Found:
PR Code Quality: Implementation is solid - proper error handling, exponential backoff with max cap, short-circuit for non-retriable errors, and clear comments about best-effort semantics. The logic correctly addresses the race condition between reset and BlockValidation async processing. |
There was a problem hiding this comment.
Pull request overview
Fixes a reset-time reorg edge case in block assembly where transactions from invalidated (moveBack) blocks could remain marked as mined (unmined_since=NULL) due to asynchronous BlockValidation timing, causing them to be missed by loadUnminedTransactions() and dropped from assembly.
Changes:
- Stop skipping
meta.InvalidmoveBack blocks when collecting txs to mark as unmined duringBlockAssembler.reset(). - Add inline rationale explaining the async timing risk and why marking here is safe.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // a BlockMinedUnset notification and may not have completed yet. If we | ||
| // skip, loadUnminedTransactions() won't find these txs (unmined_since | ||
| // still NULL) and they'll be silently lost from block assembly. | ||
| // MarkTransactionsOnLongestChain is idempotent, so double-marking is safe. |
There was a problem hiding this comment.
Good catch — "idempotent" was misleading. Reworded the comment to clarify that it's safe for correctness (ensures unmined_since is non-NULL) even though repeated calls may bump the height value.
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-04-14 19:40 UTC |
There was a problem hiding this comment.
Pull request overview
This PR fixes a reset/reorg recovery edge case where transactions from invalidated (meta.Invalid) moveBack blocks could remain in a “mined” state (unmined_since = NULL) due to async timing, causing loadUnminedTransactions() to miss them and drop them from block assembly.
Changes:
- Removes the
meta.Invalidskip for moveBack blocks when collecting tx hashes to mark as unmined. - Adds clarifying inline comments explaining the async
BlockMinedUnsettiming hazard and why re-marking is correctness-safe.
Comments suppressed due to low confidence (1)
services/blockassembly/BlockAssembler.go:472
- If GetSubtrees fails for a moveBack block, reset() currently logs and continues, which means those block’s tx hashes never get passed to MarkTransactionsOnLongestChain(false). Since loadUnminedTransactions() only scans txs with unmined_since set, this can permanently “lose” those txs from assembly after the reset. Consider failing the reset (returning an error) when subtree retrieval fails for any moveBack block, or otherwise ensuring txs from that block are still marked unmined so correctness isn’t dependent on subtree-store availability.
block := blockWithMeta.block
blockSubtrees, err := block.GetSubtrees(ctx, b.logger, b.subtreeStore, b.settings.Block.GetAndValidateSubtreesConcurrency)
if err != nil {
b.logger.Warnf("[BlockAssembler][Reset] error getting subtrees for moveBack block %s: %v (will skip)", block.Hash().String(), err)
continue
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Do NOT skip invalid blocks here. BlockValidation handles them via | ||
| // setTxMinedStatus(unsetMined=true), but that runs asynchronously via | ||
| // a BlockMinedUnset notification and may not have completed yet. If we | ||
| // skip, loadUnminedTransactions() won't find these txs (unmined_since | ||
| // still NULL) and they'll be silently lost from block assembly. | ||
| // Double-marking is safe for correctness: MarkTransactionsOnLongestChain | ||
| // ensures unmined_since is non-NULL (the key invariant for | ||
| // loadUnminedTransactions), even though repeated calls may bump the | ||
| // unmined_since height value. |
There was a problem hiding this comment.
Agreed that a dedicated test for this scenario would be valuable. The existing TestParentTransactionNETCalculationDuringReset and the sequential-postgres TestLongestChainPostgres/invalid_block test do exercise this code path, but a focused unit test for reset() with meta.Invalid moveBack blocks asserting unmined_since is set would be a good regression guard. Will track as a follow-up.
…minedTransactions The previous approach of removing the invalid block skip was insufficient because GetSubtrees may fail for invalidated blocks. Instead, explicitly wait for BlockValidation's setTxMinedStatus(unsetMined=true) to complete for each invalid moveBack block before proceeding. InvalidateBlock sets mined_set=false and sends a BlockMinedUnset notification. The async setTxMinedStatus handler processes it — unsetting mined status for the block's txs (setting unmined_since) — then sets mined_set=true. waitForBlockMinedSet polls until mined_set=true, guaranteeing loadUnminedTransactions will find the txs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
02e6585 to
b00e13a
Compare
91a8be8 to
b00e13a
Compare
There was a problem hiding this comment.
Pull request overview
This PR tightens block-assembly reset behavior around reorgs involving invalid “moveBack” blocks so that transactions from those blocks are reliably recovered into the unmined pool (via unmined_since) before loadUnminedTransactions() runs.
Changes:
- Add a reset-time wait loop for invalid moveBack blocks to allow BlockValidation’s async
setTxMinedStatus(unsetMined=true)to complete before reloading unmined transactions. - Clarify comments around skipping invalid blocks during moveBack tx unmining (BlockValidation already handles them).
- Introduce
waitForBlockMinedSet()using the sharedretryutility and BlockValidation retry settings.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }, | ||
| retry.WithBackoffDurationType(b.settings.BlockValidation.IsParentMinedRetryBackoffDuration), | ||
| retry.WithBackoffMultiplier(b.settings.BlockValidation.IsParentMinedRetryBackoffMultiplier), | ||
| retry.WithRetryCount(b.settings.BlockValidation.IsParentMinedRetryMaxRetry), | ||
| ) |
There was a problem hiding this comment.
Added retry.WithMessage to identify these retries in logs. Also switched from linear to exponential backoff with a 2s max cap, which should bring worst-case total wait from ~80s down to ~30s per block.
| blockHash := blockWithMeta.block.Hash() | ||
| b.logger.Infof("[BlockAssembler][Reset] waiting for invalid block %s to be processed by BlockValidation", blockHash.String()) | ||
| if waitErr := b.waitForBlockMinedSet(ctx, blockHash); waitErr != nil { | ||
| b.logger.Warnf("[BlockAssembler][Reset] timeout waiting for invalid block %s mined_set: %v (proceeding anyway)", blockHash.String(), waitErr) |
There was a problem hiding this comment.
Good point. Reworded the comment to say "best-effort wait" and added explicit handling for context cancellation — if ctx is cancelled, we now return an error instead of proceeding. For other failures (e.g., retries exhausted), we log a warning and proceed since those txs can be recovered on the next reset cycle.
| if waitErr := b.waitForBlockMinedSet(ctx, blockHash); waitErr != nil { | ||
| b.logger.Warnf("[BlockAssembler][Reset] timeout waiting for invalid block %s mined_set: %v (proceeding anyway)", blockHash.String(), waitErr) | ||
| } |
There was a problem hiding this comment.
Fixed — changed "timeout waiting" to "gave up waiting" which covers both retry exhaustion and other error scenarios.
| return true, nil | ||
| }, | ||
| retry.WithBackoffDurationType(b.settings.BlockValidation.IsParentMinedRetryBackoffDuration), | ||
| retry.WithBackoffMultiplier(b.settings.BlockValidation.IsParentMinedRetryBackoffMultiplier), |
There was a problem hiding this comment.
Addressed — switched to exponential backoff (factor 2.0) with a 2s max cap. With 20ms base and 45 retries, worst-case total is now ~90s (most of which is at the 2s cap) vs the previous ~80s linear, but individual retries ramp up much faster to the useful polling interval. Also dropped the linear backoff multiplier since it's not used with exponential mode.
- Reword "must wait" to "best-effort wait" with context cancellation as hard stop - Change "timeout" log to "gave up waiting" for accuracy - Add retry.WithMessage for log correlation - Switch to exponential backoff (factor 2.0, max 2s) for waitForBlockMinedSet Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Improves BlockAssembler.reset() reorg/reset recovery by ensuring transactions from invalid moveBack blocks have their mined status fully unset by BlockValidation before unmined transactions are reloaded, preventing stale unmined_since and missed recovery.
Changes:
- Adds a best-effort wait during reset for invalid moveBack blocks to reach
mined_set=true(i.e., BlockValidation finishedsetTxMinedStatus(unsetMined=true)). - Keeps skipping invalid moveBack blocks during explicit “mark moveBack txs unmined” processing, but now does so after the new wait to ensure tx state is updated.
- Introduces
waitForBlockMinedSet()using the sharedutil/retryhelper with exponential backoff and a capped max delay.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if blockWithMeta.meta.Invalid { | ||
| // Skip invalid blocks - BlockValidation already handled them via unsetMined=true | ||
| // Skip invalid blocks — BlockValidation has already handled them via | ||
| // setTxMinedStatus(unsetMined=true) which we waited for above. | ||
| continue |
There was a problem hiding this comment.
Good catch — the PR description was misleading. Updated it to accurately describe the behavior: we still skip invalid blocks in the tx collection loop (BlockValidation handles them), but we now wait for BlockValidation to finish processing before loadUnminedTransactions runs. The description now clarifies this is a best-effort wait with context cancellation as a hard stop.
There was a problem hiding this comment.
Pull request overview
This PR improves reset-time recovery of transactions from invalid moveBack blocks by adding a best-effort synchronization point to ensure BlockValidation has finished unsetting mined status (setting unmined_since) before loadUnminedTransactions() runs.
Changes:
- Add a best-effort wait during
reset()for invalid moveBack blocks to reachmined_set=true(i.e., BlockValidation finishedsetTxMinedStatus(unsetMined=true)). - Introduce
waitForBlockMinedSet()using the sharedutil/retryhelper with exponential backoff and capped delay. - Clarify comments around skipping invalid blocks in the moveBack tx collection loop (BlockValidation remains responsible for unmining them).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for _, blockWithMeta := range moveBackBlocksWithMeta { | ||
| if blockWithMeta.meta.Invalid { | ||
| blockHash := blockWithMeta.block.Hash() | ||
| b.logger.Infof("[BlockAssembler][Reset] waiting for invalid block %s to be processed by BlockValidation", blockHash.String()) | ||
| if waitErr := b.waitForBlockMinedSet(ctx, blockHash); waitErr != nil { | ||
| if ctx.Err() != nil { | ||
| return errors.NewProcessingError("[Reset] context cancelled while waiting for invalid block mined_set", waitErr) | ||
| } | ||
| b.logger.Warnf("[BlockAssembler][Reset] gave up waiting for invalid block %s mined_set: %v (proceeding anyway — txs may be recovered on next reset)", blockHash.String(), waitErr) |
There was a problem hiding this comment.
Fair point about sequential waiting. In practice, having multiple invalid moveBack blocks during a single reset is rare (typically 0-1), so the sequential approach keeps the code simple. If this becomes a real bottleneck, adding concurrent waiting with a semaphore would be straightforward. Leaving as-is for now.
| _, err := retry.Retry(ctx, b.logger, func() (bool, error) { | ||
| isMined, err := b.blockchainClient.GetBlockIsMined(ctx, blockHash) | ||
| if err != nil { | ||
| return false, err | ||
| } |
There was a problem hiding this comment.
Good catch. Added short-circuiting for ErrBlockNotFound by cancelling the retry context immediately — avoids burning the full retry budget on a block that will never appear. Other transient errors (RPC failures, etc.) still get retried normally.
- Short-circuit retries on non-retriable errors (ErrBlockNotFound) in waitForBlockMinedSet by cancelling the retry context immediately Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves BlockAssembler.reset() behavior during reorg resets by best-effort waiting for BlockValidation to finish asynchronously unmining transactions from invalid moveBack blocks before loadUnminedTransactions() runs, preventing missed recovery of unmined txs.
Changes:
- Add a best-effort per-invalid-block wait (with hard-stop on context cancellation) before reloading unmined transactions.
- Introduce
waitForBlockMinedSet()usingutil/retryto pollmined_setuntil BlockValidation has completed processing. - Update reset comments to clarify responsibility boundaries and the best-effort nature of the wait.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return false, err | ||
| } | ||
| if !isMined { | ||
| return false, errors.NewBlockParentNotMinedError( |
There was a problem hiding this comment.
The BlockParentNotMinedError usage here mirrors the exact same pattern used in BlockValidation.go (line ~1676) for the same purpose — polling mined_set. The error never escapes the retry loop (it's only used as a retry signal), so it won't trigger incorrect alerts/metrics. Introducing a new error type would add complexity without practical benefit given the error stays internal to the retry. Keeping consistent with the existing BlockValidation pattern.
| // Short-circuit on non-retriable errors (block doesn't exist in DB) | ||
| if errors.Is(err, errors.ErrBlockNotFound) { | ||
| nonRetriableErr = errors.NewProcessingError( | ||
| "[waitForBlockMinedSet] block %s not found — cannot wait for mined_set", blockHash.String()) |
There was a problem hiding this comment.
Good point — now passing the original err as the last param to NewProcessingError so it gets wrapped and the original cause is preserved in logs.
- Wrap original error in NewProcessingError for ErrBlockNotFound short-circuit to preserve cause context in logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
@freemans13 Please don't merge until we've merged #704 |




Summary
During a reorg reset, invalid moveBack blocks are handled by BlockValidation's background job (
setTxMinedStatus(unsetMined=true)), which setsunmined_sinceon their transactions. Previously,reset()calledloadUnminedTransactionswithout waiting for this background processing to complete, so those txs still hadunmined_since=NULLand were invisible to the unmined iterator.Changes:
waitForBlockMinedSet— polls until BlockValidation has finished processing each invalid moveBack block (exponential backoff, context-cancellable)loadUnminedTransactionsrunsTest plan
make lint— 0 issues🤖 Generated with Claude Code