perf(netsync): broaden quickValidationMode to cover CATCHINGBLOCKS + behind-tip#723
perf(netsync): broaden quickValidationMode to cover CATCHINGBLOCKS + behind-tip#723freemans13 wants to merge 19 commits into
Conversation
|
🤖 Claude Code Review Status: Complete Current Review: No blocking issues found. The PR successfully addresses the performance optimization goal and includes proper defensive checks in the new code paths. Key Changes Verified:
Notes:
History:
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-14 08:54 UTC |
…ationInCatchingBlocks
prepareSubtrees previously gated quickValidationMode on FSM == LEGACYSYNCING only. In production setups that restart from a pre-existing chain, the FSM begins in RUNNING and only transitions to CATCHINGBLOCKS (not LEGACYSYNCING) during peer-driven catch-up, so the quick path ran on <1% of catch-up blocks in live profiling (captured 2026-04-17). Use the new shouldUseQuickValidationMode helper to activate the quick path on LEGACYSYNCING, CATCHINGBLOCKS (gated by legacy_quickValidationInCatchingBlocks, default true), and unconditionally when legacy_forceQuickValidation=true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Subtrees A subtree written to the local store may carry either FileType — one of `FileTypeSubtreeToCheck` (downloaded from a peer, pending validation) or `FileTypeSubtree` (already validated; produced by block assembly, block persister, or legacy catch-up's quickValidationMode). Both files carry the same tx-hash list. CheckBlockSubtrees only consulted `FileTypeSubtreeToCheck`. If we already had `FileTypeSubtree` locally, the lookup still missed and we fell through to an HTTP fetch from `request.BaseUrl`. For `BaseUrl == "legacy"` the synthesised URL has no scheme, so the fetch errors with "unsupported protocol scheme" and the whole block fails validation. In practice this made every legacy-catch-up block fail the first time — retries eventually succeed via a different path, turning what should be a fast sync into a slow retry loop. Broadening quickValidationMode to cover CATCHINGBLOCKS exposed this latent bug: pre-existing LEGACYSYNCING-only gating hid it because mid-sync nodes rarely hit that state. Consult both file types via a small helper before falling back to HTTP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`extendTransactions` previously issued one PreviousOutputsDecorate DB call
per transaction. On a 20,981-tx testnet block we observed this step take
3m 32s — 94% of total block-processing time — because the non-local-parent
path scaled linearly with block size: one IN-clause query per tx, serialised
only by the outpoint-batcher goroutine limit.
Split extendTransactions into two phases:
1. Same-block parents via txMap, in per-tx goroutines (unchanged behaviour,
waits for the parent tx to finish being extended). Pulled out into
`extendFromTxMap`.
2. Non-local parents via a single `BatchPreviousOutputsDecorate` call. The
batch skips inputs already populated by phase 1, so only still-unfilled
inputs hit the database. For a 20k-tx block that's one chunked query
instead of 20k.
On a batch-decoration error that resembles "parent pruned / missing", fall
back to the original per-tx path so the DAH'd-parent recovery
(`utxoStore.Get(txHash, fields.Tx)`) still applies.
Also convert `BatchPreviousOutputsDecorate` to use a composite
`(t.hash, o.idx)` IN predicate — previously flagged as a TODO in the file
because the bulk path wasn't on the legacy-sync hot path. Now that it is,
the composite predicate avoids scanning every output of every referenced
parent, which is the behaviour that made PreviousOutputsDecorate dominate
heap on data-carrier-heavy blocks.
Expected impact on testnet 20k-tx block: extendTransactions 3m 32s → <10s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous implementation polled prevTxWrapper.Tx.IsExtended() on a 10ms tick before reading the parent's Outputs. That's unnecessary: Outputs are populated at wire-parse time and never mutated afterwards; IsExtended only checks whether the parent's *inputs* have PreviousTxScript set, which is unrelated to the parent's outputs that the child needs. The wait also introduced a deadlock under the new two-phase extendTransactions flow: a pure-non-local-parent tx only becomes "extended" after phase 2 (batch DB call). Phase 1 goroutines waiting for such a parent would time out at 120s and fail the block. Observed on testnet block 1630240. Drop the poll and read the parent's outputs directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
quickValidationMode skips checkSubtreeFromBlock, which is only safe when the block being validated is covered by the legacy headers-first checkpoint anchor. LEGACYSYNCING carries that guarantee by construction (bsvd netsync only enters the state below the final checkpoint). CATCHINGBLOCKS does not: the state flips purely on "peer is >10 blocks ahead", which also fires past the final checkpoint on a restart near tip, on a brief disconnect, or against a forked peer. In that regime, skipping subtree validation would accept subtree content on trust. Tighten the helper: CATCHINGBLOCKS activation additionally requires the block being processed to be at or below chainParams.Checkpoints' final entry. legacy_forceQuickValidation remains the explicit opt-out for trusted networks. Tests updated to cover the past-final-checkpoint CATCHINGBLOCKS case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
99c27bf to
c37e522
Compare
| // pure-non-local-parent tx only becomes "extended" after phase 2 runs. | ||
| // | ||
| // No lock needed — each goroutine writes to a unique input index. | ||
| tx.Inputs[i].PreviousTxSatoshis = prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex].Satoshis |
There was a problem hiding this comment.
✅ Resolved
[Critical] Missing bounds check before array access - This issue has been fixed.
The code now includes proper bounds checking at services/legacy/netsync/handle_block.go:1041-1049:
- Line 1041-1043: Checks if prevTxWrapper.Tx is nil
- Line 1045-1047: Validates PreviousTxOutIndex is in range
- Line 1049-1054: Checks for nil output and nil locking script
Malformed transactions will now return a processing error instead of panicking the node.
| var inLegacySync, inCatchingBlocks bool | ||
| currentState, fsmErr := sm.blockchainClient.GetFSMCurrentState(sm.ctx) | ||
| if fsmErr != nil { | ||
| sm.logger.Errorf("[prepareSubtrees] Failed to get FSM state: %s", fsmErr) |
There was a problem hiding this comment.
Out of scope / no longer applicable
The FSM query code mentioned in this comment doesn't appear in the current version of the PR. This issue appears to have been addressed through refactoring or is no longer relevant to the current implementation.
With the final-checkpoint gate in place, the only scenario the force flag covered was "skip subtree validation on tip blocks on trust," which is a security regression with no defensible production use case. Remove the setting and the forceFlag input on shouldUseQuickValidationMode. Teranode has no precedent for blanket trust overrides of validation; narrow knobs like legacy_config_DisableCheckpoints and blockvalidation_skipCheckParentMined target specific, well-understood scenarios. The checkpoint anchor is load-bearing for the quick path's safety argument, so it stays mandatory. legacy_quickValidationInCatchingBlocks is preserved as the disable knob for the CATCHINGBLOCKS path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With CATCHINGBLOCKS gated on belowFinalCheckpoint, the post-gating code path is identical to LEGACYSYNCING (which has no disable knob). There's no FSM-state-dependent behavior left that would warrant a separate kill switch for CATCHINGBLOCKS. Collapse shouldUseQuickValidationMode to three inputs (inLegacySync, inCatchingBlocks, belowFinalCheckpoint) and remove the setting and wiring. The setting was introduced on this branch, so no migration needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[Major] PR description is stale The PR description documents two settings that were removed in later commits:
The current implementation gates quick-validation mode purely on FSM state and checkpoint position, with no configurable overrides. The PR description should be updated to match the final implementation. Impact: Misleading for reviewers and future readers trying to understand what changed. |
Run gci on settings.go and legacy_settings.go to fix field-alignment drift from the removed ForceQuickValidation and QuickValidationCatchupInCatchingBlocks fields. No functional change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends legacy netsync’s quickValidationMode eligibility to include CATCHINGBLOCKS when the block being validated is at/below the chain’s final hardcoded checkpoint, and adds related performance improvements to reduce hot-path subtree and UTXO-decoration overhead during catch-up.
Changes:
- Broaden
quickValidationModegating viashouldUseQuickValidationMode(...), using a singleGetFSMCurrentStatecall and checkpoint-height anchoring. - Avoid unnecessary subtree HTTP fetches by checking both
FileTypeSubtreeToCheckandFileTypeSubtreelocally first. - Improve catch-up performance by batching previous-output decoration across transactions and tightening SQL lookups to exact
(tx_hash, vout)pairs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
stores/utxo/sql/sql.go |
Switch bulk previous-output lookup to a composite (t.hash, o.idx) IN (...) to avoid scanning unreferenced outputs and improve decoration throughput. |
services/subtreevalidation/check_block_subtrees.go |
Prefer local subtree reads (validated or pending-validation file types) before attempting HTTP fetches. |
services/legacy/netsync/handle_block.go |
Add shouldUseQuickValidationMode, compute below-final-checkpoint gating, and refactor transaction extension to a 2-phase (txMap then batched DB) flow. |
services/legacy/netsync/handle_block_test.go |
Add unit tests for shouldUseQuickValidationMode. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| belowFinalCheckpoint := false | ||
| if cps := sm.chainParams.Checkpoints; len(cps) > 0 { | ||
| belowFinalCheckpoint = block.Height() <= cps[len(cps)-1].Height | ||
| } | ||
|
|
||
| // quick validation mode is used when we are in legacy mode | ||
| // we can skip some of the processing since we assume the block is valid | ||
| quickValidationMode := legacyMode | ||
| quickValidationMode := shouldUseQuickValidationMode( | ||
| inLegacySync, | ||
| inCatchingBlocks, | ||
| belowFinalCheckpoint, | ||
| ) |
Copilot
AI
Apr 22, 2026
•
There was a problem hiding this comment.
✅ Resolved - The PR now includes the disableCheckpoints field on SyncManager and gates the belowFinalCheckpoint check accordingly (services/legacy/netsync/handle_block.go:364-368).
There was a problem hiding this comment.
Addressed: plumbed DisableCheckpoints from Config onto SyncManager.disableCheckpoints and in prepareSubtrees we now force belowFinalCheckpoint=false when checkpoints are disabled, so the CATCHINGBLOCKS path cannot skip subtree validation in that mode.
| g.Go(func() error { | ||
| txWrapper.SomeParentsInBlock = true | ||
|
|
Copilot
AI
Apr 22, 2026
•
There was a problem hiding this comment.
✅ Resolved - The data race is fixed. txWrapper.SomeParentsInBlock = true is now set once from the caller goroutine before spawning the per-input goroutines (services/legacy/netsync/handle_block.go:1020).
There was a problem hiding this comment.
Good catch — moved the txWrapper.SomeParentsInBlock = true write out of the goroutine. It's now set once from the caller goroutine before g.Go(...), so concurrent writes from the per-input goroutines are gone. Verified via go test -race on the netsync package.
| // block (available via txMap). It waits up to 120 seconds for each same-block parent | ||
| // to be fully extended itself, which is necessary because child and parent may be | ||
| // processed by different goroutines in the enclosing errgroup. |
Copilot
AI
Apr 22, 2026
•
There was a problem hiding this comment.
✅ Resolved - The doc comment has been updated to match the current implementation, which reads parent outputs immediately without waiting (services/legacy/netsync/handle_block.go:980-987).
There was a problem hiding this comment.
Updated the doc comment to match the new behavior: we no longer wait/poll for parent extension — outputs are available at parse time, so we read them immediately.
- Plumb DisableCheckpoints from Config onto SyncManager; force belowFinalCheckpoint=false when checkpoints are disabled so the CATCHINGBLOCKS quickValidationMode path cannot skip subtree validation when the operator passes --nocheckpoints. - Move txWrapper.SomeParentsInBlock = true out of the per-input goroutine in extendFromTxMap to eliminate a data race under -race. - Update extendFromTxMap doc comment to reflect that parent outputs are now read immediately (no 120s wait).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // served from the in-memory txMap, so no DB work is needed here. We run per-tx | ||
| // goroutines (bounded by OutpointBatcherSize) because each tx's own inputs are | ||
| // populated independently and may need to wait for a same-block parent to finish | ||
| // being extended first. |
There was a problem hiding this comment.
The Phase 1 comment says per-tx goroutines "may need to wait for a same-block parent to finish being extended first", but the new implementation no longer waits/polls on parent extension (extendFromTxMap reads parent outputs immediately). Please update/remove the comment to match the current behavior so future changes don't reintroduce the old deadlock workaround.
| // served from the in-memory txMap, so no DB work is needed here. We run per-tx | |
| // goroutines (bounded by OutpointBatcherSize) because each tx's own inputs are | |
| // populated independently and may need to wait for a same-block parent to finish | |
| // being extended first. | |
| // served directly from the in-memory txMap, so no DB work is needed here. We run | |
| // per-tx goroutines (bounded by OutpointBatcherSize) because each transaction's | |
| // inputs can be populated independently from txMap lookups. |
There was a problem hiding this comment.
Updated the Phase 1 comment to remove the stale "wait for a same-block parent" wording. Now reads: goroutines are bounded by OutpointBatcherSize and each tx's inputs are populated independently from txMap lookups — no waiting/polling.
| // Inline composite-IN construction keeps this function standalone; | ||
| // mirrors the pattern used in PreviousOutputsDecorate above. |
There was a problem hiding this comment.
The inline placeholder construction is described as "mirrors the pattern used in PreviousOutputsDecorate above", but PreviousOutputsDecorate currently uses buildINClause with a simple t.hash IN (...) predicate (not composite tuple placeholders). Consider either switching PreviousOutputsDecorate to the same composite-IN helper/pattern or rewording this comment to avoid implying the two implementations are structurally identical.
| // Inline composite-IN construction keeps this function standalone; | |
| // mirrors the pattern used in PreviousOutputsDecorate above. | |
| // Build a composite (t.hash, o.idx) IN clause inline here to keep this | |
| // function standalone. |
There was a problem hiding this comment.
Reworded the comment as suggested. It no longer implies structural equivalence with PreviousOutputsDecorate — it just states that a composite (t.hash, o.idx) IN clause is built inline to keep the function standalone. I'll leave PreviousOutputsDecorate as-is in this PR (its single-column t.hash IN (...) form is fine for that call site's semantics).
| // A subtree may be available locally under either: | ||
| // - FileTypeSubtreeToCheck — fetched from a peer, pending validation | ||
| // - FileTypeSubtree — already validated (e.g. legacy catch-up's | ||
| // quickValidationMode validated txs inline | ||
| // before writing the subtree) | ||
| // We must consult both before falling back to an HTTP fetch. Otherwise | ||
| // CheckBlockSubtrees will try to HTTP-download a subtree we already have | ||
| // — and for baseURL="legacy" the synthetic URL has no scheme, so the | ||
| // request fails outright. | ||
| localFileType, localExists, err := u.findLocalSubtreeFile(gCtx, subtreeHash) | ||
| if err != nil { | ||
| return errors.NewProcessingError("[CheckBlockSubtrees][%s] failed to check if subtree exists in store", subtreeHash.String(), err) | ||
| } | ||
|
|
||
| var subtreeToCheck *subtreepkg.Subtree | ||
|
|
||
| if subtreeToCheckExists { | ||
| // get the subtreeToCheck from the store | ||
| subtreeReader, err := u.subtreeStore.GetIoReader(gCtx, subtreeHash[:], fileformat.FileTypeSubtreeToCheck) | ||
| if localExists { | ||
| // read from whichever local file we found | ||
| subtreeReader, err := u.subtreeStore.GetIoReader(gCtx, subtreeHash[:], localFileType) | ||
| if err != nil { | ||
| return errors.NewStorageError("[CheckBlockSubtrees][%s] failed to get subtree from store", subtreeHash.String(), err) |
There was a problem hiding this comment.
New behavior: CheckBlockSubtrees now reads the subtree structure from FileTypeSubtree as well as FileTypeSubtreeToCheck. The existing unit tests only cover the FileTypeSubtreeToCheck path; please add a test case where only FileTypeSubtree exists (and FileTypeSubtreeToCheck is absent) to ensure we don’t regress back to the HTTP fetch path (especially for baseURL="legacy").
There was a problem hiding this comment.
Added TestFindLocalSubtreeFile with three subtests: (1) only FileTypeSubtreeToCheck present, (2) only FileTypeSubtree present (the new legacy quickValidationMode path), and (3) neither present. The FileTypeSubtree-only case is the regression guard you asked for — it fails if we ever drop back to FileTypeSubtreeToCheck-only lookup which would reintroduce the HTTP fallback for baseURL="legacy".
| return "", false, err | ||
| } | ||
| if exists { | ||
| return fileformat.FileTypeSubtreeToCheck, true, nil | ||
| } | ||
| exists, err = u.subtreeStore.Exists(ctx, subtreeHash[:], fileformat.FileTypeSubtree) | ||
| if err != nil { | ||
| return "", false, err | ||
| } | ||
| if exists { | ||
| return fileformat.FileTypeSubtree, true, nil | ||
| } | ||
| return "", false, nil |
There was a problem hiding this comment.
findLocalSubtreeFile returns a raw empty string ("") for the FileType on error/not-found paths. Since FileTypeUnknown is defined as the empty value, returning fileformat.FileTypeUnknown would make the intent clearer and keep return values consistently typed.
| return "", false, err | |
| } | |
| if exists { | |
| return fileformat.FileTypeSubtreeToCheck, true, nil | |
| } | |
| exists, err = u.subtreeStore.Exists(ctx, subtreeHash[:], fileformat.FileTypeSubtree) | |
| if err != nil { | |
| return "", false, err | |
| } | |
| if exists { | |
| return fileformat.FileTypeSubtree, true, nil | |
| } | |
| return "", false, nil | |
| return fileformat.FileTypeUnknown, false, err | |
| } | |
| if exists { | |
| return fileformat.FileTypeSubtreeToCheck, true, nil | |
| } | |
| exists, err = u.subtreeStore.Exists(ctx, subtreeHash[:], fileformat.FileTypeSubtree) | |
| if err != nil { | |
| return fileformat.FileTypeUnknown, false, err | |
| } | |
| if exists { | |
| return fileformat.FileTypeSubtree, true, nil | |
| } | |
| return fileformat.FileTypeUnknown, false, nil |
There was a problem hiding this comment.
Good suggestion — switched all three not-found / error paths in findLocalSubtreeFile to return fileformat.FileTypeUnknown explicitly instead of the raw empty string. Keeps intent clear and return values consistently typed.
- handle_block.go: remove stale "wait for same-block parent" wording from the Phase 1 comment in extendTransactions; the new flow reads parent outputs immediately from txMap with no polling. - check_block_subtrees.go: return fileformat.FileTypeUnknown instead of a raw empty string from findLocalSubtreeFile on not-found/error paths. - check_block_subtrees_test.go: add TestFindLocalSubtreeFile covering FileTypeSubtreeToCheck-present, FileTypeSubtree-only (legacy quick validation path), and neither-present cases. - stores/utxo/sql/sql.go: reword inline composite-IN comment so it describes what the code does instead of implying structural equivalence with PreviousOutputsDecorate.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // - belowFinalCheckpoint: the block being validated is at or below the | ||
| // chain's final hardcoded checkpoint height. When true, the headers-first | ||
| // chain guarantees subtree-content integrity and skipping subtree-level | ||
| // validation is safe. Above the final checkpoint (or when checkpoints are | ||
| // disabled) no such anchor exists and full subtree validation is required. | ||
| // | ||
| // CATCHINGBLOCKS activation additionally requires belowFinalCheckpoint so the | ||
| // checkpoint-anchor safety argument matches LEGACYSYNCING. There is no | ||
| // override for running quickValidationMode past the final checkpoint: subtree | ||
| // validation on tip blocks protects against malicious or forked peers and | ||
| // must not be skipped. | ||
| func shouldUseQuickValidationMode(inLegacySync, inCatchingBlocks, belowFinalCheckpoint bool) bool { | ||
| if inLegacySync { | ||
| return true | ||
| } |
There was a problem hiding this comment.
The doc comment says that when checkpoints are disabled, full subtree validation is required (no checkpoint anchor), but shouldUseQuickValidationMode currently returns true whenever inLegacySync is true regardless of belowFinalCheckpoint (which the caller forces to false when checkpoints are disabled). Please align the comment and logic: either (a) update the comment to clarify that the checkpoints-disabled restriction only applies to the CATCHINGBLOCKS branch, or (b) incorporate the checkpoints-disabled/belowFinalCheckpoint guard into the LEGACYSYNCING branch as well.
There was a problem hiding this comment.
Expanded the doc comment: (a) called out that the caller forces belowFinalCheckpoint=false when checkpoints are disabled, and (b) added a paragraph explaining why LEGACYSYNCING doesn't re-check it here — the DisableCheckpoints guard in manager.go (around line 2302) means nextCheckpoint stays nil and headers-first mode never activates, so the checkpoints-disabled guard is applied upstream. Keeping the LEGACYSYNCING branch unconditional matches bsvd's model; the comment now makes that reasoning explicit instead of implicit.
| tx.Inputs[i].PreviousTxSatoshis = prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex].Satoshis | ||
| tx.Inputs[i].PreviousTxScript = bscript.NewFromBytes(*prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex].LockingScript) | ||
|
|
There was a problem hiding this comment.
extendFromTxMap indexes prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex] without validating that PreviousTxOutIndex is in-range. A malformed/hostile block could trigger an out-of-bounds panic here (DoS) before the block is rejected. Please add a bounds check (and a nil output/script check if applicable) and return a validation/processing error instead of panicking.
| tx.Inputs[i].PreviousTxSatoshis = prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex].Satoshis | |
| tx.Inputs[i].PreviousTxScript = bscript.NewFromBytes(*prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex].LockingScript) | |
| if prevTxWrapper == nil || prevTxWrapper.Tx == nil { | |
| return errors.NewProcessingError("failed to extend transaction %s: previous transaction %s is missing", tx.TxIDChainHash(), prevTxHash) | |
| } | |
| prevOutIdx := int(input.PreviousTxOutIndex) | |
| if prevOutIdx < 0 || prevOutIdx >= len(prevTxWrapper.Tx.Outputs) { | |
| return errors.NewProcessingError("failed to extend transaction %s: previous output index %d out of range for transaction %s", tx.TxIDChainHash(), input.PreviousTxOutIndex, prevTxHash) | |
| } | |
| prevOutput := prevTxWrapper.Tx.Outputs[prevOutIdx] | |
| if prevOutput == nil { | |
| return errors.NewProcessingError("failed to extend transaction %s: previous output %d is nil for transaction %s", tx.TxIDChainHash(), input.PreviousTxOutIndex, prevTxHash) | |
| } | |
| if prevOutput.LockingScript == nil { | |
| return errors.NewProcessingError("failed to extend transaction %s: previous output %d has nil locking script for transaction %s", tx.TxIDChainHash(), input.PreviousTxOutIndex, prevTxHash) | |
| } | |
| tx.Inputs[i].PreviousTxSatoshis = prevOutput.Satoshis | |
| tx.Inputs[i].PreviousTxScript = bscript.NewFromBytes(*prevOutput.LockingScript) |
There was a problem hiding this comment.
Good defensive hardening — added the checks inside the goroutine: (1) prevTxWrapper.Tx nil, (2) PreviousTxOutIndex out of range vs len(prevTxWrapper.Tx.Outputs), (3) prevOutput nil or LockingScript nil. Each returns a NewProcessingError so the block is rejected cleanly instead of panicking. Tests still green under -race.
| // maxINClauseSize bounds the total parameter count. Each pair contributes | ||
| // two parameters, so chunk size must be halved. |
There was a problem hiding this comment.
The comment says maxINClauseSize bounds the total parameter count, but the constant is defined as the maximum number of placeholders in a single IN clause (chosen to stay under SQLite's 999 bind-param limit with headroom). Since this code now uses two bind params per tuple, reword the comment to reflect the actual constraint (e.g., maxINClauseSize is an item-count limit derived from the bind-param cap, so we use maxINClauseSize/2 tuples).
| // maxINClauseSize bounds the total parameter count. Each pair contributes | |
| // two parameters, so chunk size must be halved. | |
| // maxINClauseSize is the per-IN-clause placeholder limit derived from the | |
| // database bind-parameter cap. Each (hash, idx) pair uses two parameters, | |
| // so each chunk can contain at most maxINClauseSize/2 composite tuples. |
There was a problem hiding this comment.
Applied the suggested wording. The comment now states that maxINClauseSize is the per-IN-clause placeholder limit derived from the bind-parameter cap, and that each composite (hash, idx) pair uses two params so each chunk can hold at most maxINClauseSize/2 tuples.
- handle_block.go: expand shouldUseQuickValidationMode doc comment to clarify that the checkpoints-disabled guard is applied by the caller (for the CATCHINGBLOCKS path) and upstream in manager.go for LEGACYSYNCING (nextCheckpoint stays nil + headers-first never activates when DisableCheckpoints is true). - handle_block.go: bounds-check PreviousTxOutIndex and nil-check prevTxWrapper.Tx / prevOutput / LockingScript in extendFromTxMap so a malformed or hostile block returns a processing error instead of panicking with a slice OOB. - stores/utxo/sql/sql.go: reword the pairChunkSize comment to state that maxINClauseSize is a placeholder/bind-param limit, and that each (hash, idx) tuple uses two params.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tx := txWrapper.Tx | ||
| txs = append(txs, tx) | ||
|
|
||
| g.Go(func() error { | ||
| if err := sm.extendFromTxMap(gCtx, tx, txMap); err != nil { | ||
| return errors.NewTxError("failed to extend transaction from txMap", err) | ||
| } | ||
| return nil | ||
| }) |
There was a problem hiding this comment.
In extendTransactions phase 1, the goroutine closure captures the loop-local variable tx without re-binding it per iteration. Because tx is assigned each loop iteration, goroutines can end up all operating on the last transaction (classic loop-variable capture bug). Rebind tx inside the loop (e.g., tx := tx) or pass it as a parameter to the closure before calling extendFromTxMap.
There was a problem hiding this comment.
Thanks, but I believe this is actually safe. Two reasons:
-
txis not a loop variable — it's declared inside the loop body viatx := txWrapper.Tx(line 953). Each iteration creates a fresh variable, so the closure on line 956 captures that iteration'stx. The classic gotcha applies to the loop variables in theforheader (idx,wireTx), not to:=declarations inside the loop body. -
Even if it were a loop variable, this repo is on Go 1.26 (
go.modline 3), so per-iteration scoping has been the default since the Go 1.22LoopVarchange.
For safety I did run the netsync suite under -race after my round 1 fix and saw no races, and the integration tests exercise this path with blocks containing same-block parent/child dependencies.
Leaving as-is unless you can point out a specific case where the current code misbehaves.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Phase 2: for inputs whose parents are NOT same-block, batch the decoration into | ||
| // a single IN-clause DB query per chunk instead of issuing one per tx. For a | ||
| // 20k-tx block this collapses ~20k round-trips into O(N / maxINClauseSize). | ||
| // | ||
| // BatchPreviousOutputsDecorate skips inputs that already have PreviousTxScript set, | ||
| // so Phase 1's work is preserved. If it returns a processing/not-found error the | ||
| // most likely cause is a parent that's been pruned (DAH'd) because the child | ||
| // already had a prior processing pass. Fall back to per-tx decoration so the | ||
| // existing recovery path (utxoStore.Get on the child itself) can still kick in. | ||
| if batchErr := sm.utxoStore.BatchPreviousOutputsDecorate(ctx, txs); batchErr != nil { | ||
| if errors.Is(batchErr, errors.ErrProcessing) || errors.Is(batchErr, errors.ErrTxNotFound) { | ||
| return sm.extendPerTxFallback(ctx, txs) | ||
| } | ||
| return errors.NewProcessingError("failed to batch-decorate previous outputs", batchErr) |
There was a problem hiding this comment.
extendTransactions phase 2 now calls utxoStore.BatchPreviousOutputsDecorate(ctx, txs) once for the whole block. For the Aerospike UTXO store, BatchPreviousOutputsDecorate currently just loops PreviousOutputsDecorate per-tx sequentially (stores/utxo/aerospike/get.go:1183-1189), which removes the previous per-tx concurrency and prevents the outpoint batcher from coalescing requests across transactions. Consider either (a) making the Aerospike BatchPreviousOutputsDecorate enqueue all inputs across all txs before waiting, or (b) keeping a concurrent per-tx call path here when the backing store doesn’t implement a true bulk operation, to avoid a performance regression on large blocks.
| // Phase 2: for inputs whose parents are NOT same-block, batch the decoration into | |
| // a single IN-clause DB query per chunk instead of issuing one per tx. For a | |
| // 20k-tx block this collapses ~20k round-trips into O(N / maxINClauseSize). | |
| // | |
| // BatchPreviousOutputsDecorate skips inputs that already have PreviousTxScript set, | |
| // so Phase 1's work is preserved. If it returns a processing/not-found error the | |
| // most likely cause is a parent that's been pruned (DAH'd) because the child | |
| // already had a prior processing pass. Fall back to per-tx decoration so the | |
| // existing recovery path (utxoStore.Get on the child itself) can still kick in. | |
| if batchErr := sm.utxoStore.BatchPreviousOutputsDecorate(ctx, txs); batchErr != nil { | |
| if errors.Is(batchErr, errors.ErrProcessing) || errors.Is(batchErr, errors.ErrTxNotFound) { | |
| return sm.extendPerTxFallback(ctx, txs) | |
| } | |
| return errors.NewProcessingError("failed to batch-decorate previous outputs", batchErr) | |
| // Phase 2: for inputs whose parents are NOT same-block, decorate previous outputs | |
| // with bounded per-tx concurrency. This preserves the pre-existing concurrency | |
| // characteristics for stores whose BatchPreviousOutputsDecorate implementation is | |
| // only a sequential wrapper around PreviousOutputsDecorate, while still leaving | |
| // Phase 1's same-block population intact. | |
| if err = sm.extendPreviousOutputsConcurrent(ctx, txs, outpointBatcherSize); err != nil { | |
| return err | |
| } | |
| return nil | |
| } | |
| func (sm *SyncManager) extendPreviousOutputsConcurrent(ctx context.Context, txs []*bt.Tx, limit int) error { | |
| g, gCtx := errgroup.WithContext(ctx) | |
| util.SafeSetLimit(g, limit) | |
| for _, tx := range txs { | |
| tx := tx | |
| g.Go(func() error { | |
| if err := sm.utxoStore.PreviousOutputsDecorate(gCtx, tx); err != nil { | |
| return err | |
| } | |
| return nil | |
| }) | |
| } | |
| if err := g.Wait(); err != nil { | |
| if errors.Is(err, errors.ErrProcessing) || errors.Is(err, errors.ErrTxNotFound) { | |
| return sm.extendPerTxFallback(ctx, txs) | |
| } | |
| return errors.NewProcessingError("failed to decorate previous outputs", err) |
There was a problem hiding this comment.
Good catch — fixed by parallelizing BatchPreviousOutputsDecorate inside the Aerospike store itself (rather than in the caller) so the SQL store keeps its true bulk IN-clause path and Aerospike regains cross-tx batching via its shared outpointBatcher.
Details:
- Wrapped the per-tx
PreviousOutputsDecoratecalls in anerrgroupbounded bys.settings.UtxoStore.OutpointBatcherSize(default 100) so goroutine count stays bounded on large blocks while still letting many txs' inputs pipeline into the same batcher flush. BatchPreviousOutputsDecorate's contract (dedup via PreviousTxScript != nil, error semantics) is unchanged.
This preserves the single-pass DB read for SQL and recovers the cross-tx outpoint coalescing for Aerospike without having to special-case the call site.
| if int(input.PreviousTxOutIndex) >= len(prevTxWrapper.Tx.Outputs) { | ||
| return errors.NewProcessingError("failed to extend transaction %s: previous output index %d out of range for transaction %s (outputs=%d)", tx.TxIDChainHash(), input.PreviousTxOutIndex, prevTxHash, len(prevTxWrapper.Tx.Outputs)) | ||
| } | ||
|
|
||
| prevOutput := prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex] |
There was a problem hiding this comment.
extendFromTxMap bounds-checkes PreviousTxOutIndex by casting it to int (if int(input.PreviousTxOutIndex) >= len(...)), but then indexes the slice using the original uint32 value. On 32-bit builds, the uint32 -> int cast can overflow to a negative value and bypass the check, leading to an out-of-bounds panic (DoS) for a hostile/malformed block. Compare using unsigned types (e.g., if input.PreviousTxOutIndex >= uint32(len(outputs))) or otherwise guard the cast before indexing.
| if int(input.PreviousTxOutIndex) >= len(prevTxWrapper.Tx.Outputs) { | |
| return errors.NewProcessingError("failed to extend transaction %s: previous output index %d out of range for transaction %s (outputs=%d)", tx.TxIDChainHash(), input.PreviousTxOutIndex, prevTxHash, len(prevTxWrapper.Tx.Outputs)) | |
| } | |
| prevOutput := prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex] | |
| if input.PreviousTxOutIndex >= uint32(len(prevTxWrapper.Tx.Outputs)) { | |
| return errors.NewProcessingError("failed to extend transaction %s: previous output index %d out of range for transaction %s (outputs=%d)", tx.TxIDChainHash(), input.PreviousTxOutIndex, prevTxHash, len(prevTxWrapper.Tx.Outputs)) | |
| } | |
| outputIndex := int(input.PreviousTxOutIndex) | |
| prevOutput := prevTxWrapper.Tx.Outputs[outputIndex] |
There was a problem hiding this comment.
Fair point — switched the bounds check to a uint32 comparison: if input.PreviousTxOutIndex >= uint32(len(prevTxWrapper.Tx.Outputs)). Removes the 32-bit-build int() overflow hole. Also dropped the subsequent int() cast since we index with the uint32 directly.
- stores/utxo/aerospike/get.go: parallelize BatchPreviousOutputsDecorate so per-tx PreviousOutputsDecorate calls run concurrently (bounded by UtxoStore.OutpointBatcherSize), letting the shared outpointBatcher coalesce inputs across txs on large blocks. The previous sequential loop regressed cross-tx batching under the new netsync two-phase flow. - services/legacy/netsync/handle_block.go: compare PreviousTxOutIndex as uint32 against len(Outputs) instead of casting to int, so 32-bit builds can't overflow past the bounds check. Indexing uses the uint32 value directly.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…alidation-catchup
2fcc259 to
f5072d3
Compare
| // (IsExtended checks the parent's *inputs*, not its outputs) and caused | ||
| // a deadlock under the two-phase flow in extendTransactions, where a | ||
| // pure-non-local-parent tx only becomes "extended" after phase 2 runs. | ||
| tx.Inputs[i].PreviousTxSatoshis = prevTxWrapper.Tx.Outputs[input.PreviousTxOutIndex].Satoshis |
There was a problem hiding this comment.
Out of scope for this PR
This issue exists in pre-existing code (ExtendTransaction function) that is not modified by this PR. The PR correctly adds defensive nil-safety checks to the new extendFromTxMap function (lines 1064-1078), which is the code path this PR introduces/modifies.
The ExtendTransaction function (lines 1286-1377) should have the same defensive checks, but that's a separate issue to be addressed in a dedicated security hardening PR, not bundled into this performance optimization work.
Recommendation: File a separate issue to harden ExtendTransaction with the same defensive checks (nil Tx, bounds check using uint32 comparison, nil output/LockingScript checks) to maintain security consistency across both code paths.
…eckpoint Extends the legacy catch-up quick path to also fire in CATCHINGBLOCKS when the block being validated is at or below the chain's final hardcoded checkpoint. The checkpoint anchor gives the same headers-first safety guarantee as LEGACYSYNCING, so skipping subtree-level re-validation for those blocks is safe. Past the final checkpoint we fall back to the full path — subtree validation on tip blocks must not be skipped. --nocheckpoints disables the anchor entirely; in that mode quickValidationMode can only activate via LEGACYSYNCING (which bsvd itself gates on checkpoint coverage upstream in manager.go). Inlines the branching directly in prepareSubtrees rather than introducing a helper — the condition is a two-case OR. Also: - Wire Config.DisableCheckpoints into SyncManager so prepareSubtrees can honour --nocheckpoints. - subtreevalidation: accept FileTypeSubtree as a local source in CheckBlockSubtrees. legacy catch-up's quickValidationMode writes the subtree under FileTypeSubtree (not FileTypeSubtreeToCheck) because the txs were already validated inline, so the old Exists-check missed it and fell back to an HTTP download that cannot succeed for the synthetic baseURL="legacy". - aerospike: run BatchPreviousOutputsDecorate per-tx in parallel via errgroup so the shared outpointBatcher can coalesce inputs across txs. The sequential loop regressed throughput on large blocks.
f5072d3 to
bff8321
Compare
…alidation-catchup # Conflicts: # services/legacy/netsync/handle_block.go # stores/utxo/sql/sql.go
…chup' into stu/netsync-quick-validation-catchup # Conflicts: # services/legacy/netsync/handle_block.go
oskarszoon
left a comment
There was a problem hiding this comment.
Has a merge conflict with main — please rebase before review.
…alidation-catchup
|
|
Closing — the core change here (broadening The three independent improvements bundled here will be re-opened as focused PRs:
|


Problem
Legacy netsync has a
quickValidationMode(parallel UTXO create-then-spend, skipcheckSubtreeFromBlock) that activates inLEGACYSYNCING. The safety argument: bsvd netsync only enters that state while catching up to a hardcoded checkpoint via headers-first sync, so blocks in that window are anchored to the checkpoint hash.When teranode runs both legacy and p2p services and the p2p path drives catch-up, the FSM transitions to
CATCHINGBLOCKSinstead. If catch-up is still within the checkpoint-covered range, the same safety argument holds — but the gating never activated the quick path there.Live profiling on 2026-04-17 showed the quick path ran on <1% of catch-up blocks, leaving
checkSubtreeFromBlock(~337 ms/block mainnet, 2.4 s/block testnet) on the hot path.Fix
Activate
quickValidationModeinCATCHINGBLOCKStoo, with the same checkpoint-anchor requirement: the block being validated must be at or below the chain's final hardcoded checkpoint. Past the final checkpoint, full subtree validation always runs — tip blocks must not be fast-validated on trust.New helper
shouldUseQuickValidationMode(inLegacySync, inCatchingBlocks, belowFinalCheckpoint);prepareSubtreesnow uses a singleGetFSMCurrentStatecall instead of two sequentialIsFSMCurrentStatecalls.Behaviour
No new operator settings — the checkpoint anchor is load-bearing and isn't overridable.
Testing
Unit tests for
shouldUseQuickValidationMode; existingservices/legacy/netsynctests pass under `-race`; `make lint` clean.