fix(subtreevalidation): consult FileTypeSubtree before HTTP fallback#863
Conversation
CheckBlockSubtrees only checked for FileTypeSubtreeToCheck (the "downloaded from peer, pending validation" marker) before deciding the subtree was missing locally and falling back to an HTTP fetch. On the legacy catch-up / quickValidationMode path the subtree is already validated inline and is written as FileTypeSubtree (the "validated" marker), not FileTypeSubtreeToCheck. The HTTP fallback then tries to fetch from a synthetic baseURL="legacy" whose URL "legacy/subtree/<hash>" has no scheme — the request fails outright, even though the file is right there in the store. New helper findLocalSubtreeFile consults both file types and reports which one (if any) holds the subtree. CheckBlockSubtrees uses its return value to read from the right file type and skip the HTTP fallback when local data is present. Tests: - TestFindLocalSubtreeFile/FileTypeSubtreeToCheck present - TestFindLocalSubtreeFile/FileTypeSubtree only (quickValidationMode/legacy) - TestFindLocalSubtreeFile/neither present Originally bundled in bsv-blockchain#723; pulled out as a focused PR after bsv-blockchain#723's core change (CATCHINGBLOCKS quick-validation gating) was superseded by bsv-blockchain#847. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🤖 Claude Code Review Status: Complete Current Review: This PR introduces a well-designed fix for a subtle bug where CheckBlockSubtrees and related functions only looked for FileTypeSubtreeToCheck when checking for local subtrees, missing FileTypeSubtree files written by legacy catch-up paths. This caused unnecessary HTTP fallback attempts that would fail outright for baseURL="legacy". Code Quality: The implementation is clean, properly tested, and includes appropriate error handling. The author has addressed all feedback from previous review rounds:
No issues found. The PR follows AGENTS.md guidelines with minimal changes, comprehensive tests, and clear documentation. Ready for merge pending human approval. |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-21 13:16 UTC |
…on-find-local-subtree
There was a problem hiding this comment.
Pull request overview
This PR fixes CheckBlockSubtrees incorrectly falling back to HTTP subtree downloads when the subtree is already present locally under FileTypeSubtree (the “already validated” marker used by legacy catch-up / quick validation), by checking both relevant file types before attempting the HTTP fetch.
Changes:
- Add
findLocalSubtreeFilehelper to detect whether a subtree exists locally underFileTypeSubtreeToCheckorFileTypeSubtree. - Update
CheckBlockSubtreesto read the subtree from whichever local file type exists, avoiding the HTTP fallback when local data is present. - Add unit tests covering all
findLocalSubtreeFilepresence/absence combinations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| services/subtreevalidation/check_block_subtrees.go | Consult both subtree file markers before HTTP fallback; read subtree using the discovered local file type. |
| services/subtreevalidation/check_block_subtrees_test.go | Add targeted unit tests for findLocalSubtreeFile to prevent regressions (including legacy/quick-validation scenario). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
oskarszoon
left a comment
There was a problem hiding this comment.
Approve. findLocalSubtreeFile cleanly extracts the both-types check before HTTP fallback. Content-addressed store = no stale-data risk. Both-present priority (FileTypeSubtreeToCheck wins) matches normal p2p ordering.
Nits:
- New tests use
assert.Equal— repo convention (testing.md) isrequire. Lines 2668, 2681, 2691 ofcheck_block_subtrees_test.go. - Missing test for both file types simultaneously present — documents the priority.
Same root cause as bsv-blockchain#863: callers that look up a subtree by a single file type miss the case where a sibling code path has already promoted it to FileTypeSubtree. They then needlessly re-fetch from the peer (or, on the legacy catch-up path where baseURL="legacy" has no scheme, fail outright). Apply the dual-check pattern in three more places: - get_blocks.go:fetchAndStoreSubtree — catch-up path - quick_validate.go:readSubtree — retry path that already documents "for retry scenarios" in its FileTypeSubtree existence check - SubtreeValidation.go:getSubtreeTxHashes — uses existing findLocalSubtreeFile helper The blockvalidation package gets a sibling helper (free function over blob.Store) to avoid a cross-service import. Regression tests cover the FileTypeSubtree-only path for each call site.
| u.logger.Debugf("[getSubtreeTxHashes][%s] found subtreeToCheck file in store, using it instead of network request", subtreeHash.String()) | ||
| // Use the local subtree file (under either FileTypeSubtreeToCheck or | ||
| // FileTypeSubtree) instead of a network request — see findLocalSubtreeFile. | ||
| if localFileType, localExists, _ := u.findLocalSubtreeFile(spanCtx, *subtreeHash); localExists { |
There was a problem hiding this comment.
[Minor] Error silently discarded in inline condition
The inline if statement on line 966 discards the error returned by findLocalSubtreeFile:
if localFileType, localExists, _ := u.findLocalSubtreeFile(spanCtx, *subtreeHash); localExists {If findLocalSubtreeFile returns an error (storage access failure), the code silently proceeds to attempt an HTTP fetch rather than reporting the storage error to the caller.
Suggested fix:
localFileType, localExists, err := u.findLocalSubtreeFile(spanCtx, *subtreeHash)
if err != nil {
return nil, errors.NewStorageError("[getSubtreeTxHashes][%s] failed to check local store", subtreeHash.String(), err)
}
if localExists {
// existing logic...
}This ensures storage errors are surfaced rather than masked by a fallback fetch.
There was a problem hiding this comment.
Good catch — fixed in the next push. findLocalSubtreeFile errors are now surfaced as a StorageError from getSubtreeTxHashes instead of silently falling through to the HTTP path (which would be especially misleading for baseURL="legacy").
…on-find-local-subtree
| if localFileType, localExists, _ := u.findLocalSubtreeFile(spanCtx, *subtreeHash); localExists { | ||
| localBytes, getErr := u.subtreeStore.Get(spanCtx, subtreeHash[:], localFileType) | ||
| if getErr == nil && localBytes != nil { |
There was a problem hiding this comment.
Agreed — fixed. getSubtreeTxHashes now propagates errors from findLocalSubtreeFile as a StorageError instead of silently falling through to HTTP (which would be especially misleading for baseURL="legacy").
| // No HTTP responder registered; if the implementation falls through | ||
| // to the network this call must fail. | ||
| stat := gocore.NewStat("test") | ||
| hashes, err := server.getSubtreeTxHashes(context.Background(), stat, &subtreeHash, testPeerURL) |
There was a problem hiding this comment.
Good defensive suggestion — applied. The test now activates httpmock against util.HTTPClient() with no responders (so any stray HTTP fallback fails fast instead of waiting on the default 60s timeout) and wraps the call in a 5s context timeout as a second line of defence.
…reeTxHashes Don't discard the error from findLocalSubtreeFile when the subtree store returns a real failure — surface it as a StorageError instead of silently falling through to the HTTP path. The fallback would be especially misleading for baseURL="legacy" (synthetic scheme-less URL). Also harden TestGetSubtreeTxHashes_LocalFile: activate httpmock against util.HTTPClient() with no responders and wrap the call in a 5s context timeout so any future regression that falls back to HTTP fails fast rather than hanging on the default 60s client timeout.
| if localExists { | ||
| localBytes, getErr := u.subtreeStore.Get(spanCtx, subtreeHash[:], localFileType) | ||
| if getErr == nil && localBytes != nil { | ||
| u.logger.Debugf("[getSubtreeTxHashes][%s] found local subtree file in store (%s), using it instead of network request", subtreeHash.String(), localFileType.String()) |
There was a problem hiding this comment.
Fair point — same class of bug as the previous round. getErr is now propagated as a StorageError instead of being swallowed, and a nil-byte response despite Exists=true is treated as a storage error rather than silently falling through to HTTP.
| // request fails outright. | ||
| localFileType, localExists, err := u.findLocalSubtreeFile(gCtx, subtreeHash) | ||
| if err != nil { | ||
| return errors.NewProcessingError("[CheckBlockSubtrees][%s] failed to check if subtree exists in store", subtreeHash.String(), err) |
There was a problem hiding this comment.
Agreed — switched to NewStorageError to match ValidateSubtreeInternal's convention for the same "failed to check if subtree exists in store" condition.
| // Use the local subtree file (under either FileTypeSubtreeToCheck or | ||
| // FileTypeSubtree) instead of a network request — see findLocalSubtreeFile. | ||
| localFileType, localExists, err := u.findLocalSubtreeFile(spanCtx, *subtreeHash) | ||
| if err != nil { |
There was a problem hiding this comment.
Reasonable observation about the extra round-trip on remote backends, but declining for this PR — refactoring findLocalSubtreeFile to return a reader/bytes (rather than just the located file type) touches four call sites with different consumption patterns (one wants bytes, two want GetIoReader, one wants both file type + later Get). That's beyond the scope of this fix and risks lifecycle/close-handling regressions. Happy to do it in a follow-up PR focused on the helper signature.
| func (u *BlockValidation) readSubtree(ctx context.Context, block *model.Block, subtreeIdx int, subtreeHash *chainhash.Hash) subtreeResult { | ||
| // get the subtree from disk, should be in .subtreeToCheck | ||
| subtreeReader, err := u.subtreeStore.GetIoReader(ctx, subtreeHash[:], fileformat.FileTypeSubtreeToCheck) | ||
| // On retry the subtree may already be promoted to FileTypeSubtree (the | ||
| // "already validated" marker) and FileTypeSubtreeToCheck cleaned up, so | ||
| // consult both file types — see findLocalSubtreeFile. | ||
| localFileType, localExists, err := findLocalSubtreeFile(ctx, u.subtreeStore, *subtreeHash) | ||
| if err != nil { | ||
| return subtreeResult{err: errors.NewStorageError("[getBlockTransactions][%s] failed to locate subtree %s", block.Hash().String(), subtreeHash.String(), err)} | ||
| } | ||
| if !localExists { | ||
| return subtreeResult{err: errors.NewNotFoundError("[getBlockTransactions][%s] subtree %s not found locally", block.Hash().String(), subtreeHash.String())} | ||
| } |
There was a problem hiding this comment.
Declining — the [getBlockTransactions] prefix is the pre-existing convention throughout this whole function (lines 753, 761, 767, 777, 785, 790 all use it). The two lines this PR added were intentionally matching the surrounding style. Renaming all of them is a separate cleanup that belongs in its own PR per the small-diff rule.
| // See findLocalSubtreeFile for why both must be consulted. | ||
| localFileType, localExists, err := findLocalSubtreeFile(ctx, u.subtreeStore, *subtreeHash) | ||
| if err != nil { | ||
| return nil, errors.NewProcessingError("[catchup:fetchAndStoreSubtree] Error checking subtree existence for %s: %v", subtreeHash.String(), err) |
There was a problem hiding this comment.
Good catch — confirmed by reading errors.New: when the last param is an error it gets popped out as the wrapped error, leaving the format string with one too few args (so %v becomes %!v(MISSING)). Fixed: removed the trailing %v from the format string and changed to NewStorageError (which is the right code for a store Exists failure).
- subtreevalidation/SubtreeValidation.go (getSubtreeTxHashes): propagate the subtreeStore.Get error (and treat a nil-bytes result despite Exists=true) as StorageError, instead of silently falling through to the HTTP path on store failure. - subtreevalidation/check_block_subtrees.go (CheckBlockSubtrees): classify the findLocalSubtreeFile failure as StorageError to match ValidateSubtreeInternal's convention for the same condition. - blockvalidation/get_blocks.go (fetchAndStoreSubtree): the previous format string had "%s: %v" with two trailing params, but the last (err) is consumed by errors.New as the wrapped error, leaving %v unfilled (%!v(MISSING)). Drop the %v and reclassify as StorageError.
|



Summary
CheckBlockSubtreesand three sibling code paths only checked forFileTypeSubtreeToCheck(the "downloaded from peer, pending validation" marker) before deciding the subtree was missing locally and falling back to an HTTP fetch. On the legacy catch-up /quickValidationModepath the subtree has already been validated inline and is written asFileTypeSubtree(the "already validated" marker), notFileTypeSubtreeToCheck.The HTTP fallback then tries to fetch from a synthetic baseURL=
"legacy"whose URLlegacy/subtree/<hash>has no scheme — the request fails outright, even though the file is right there in the store. Outside the legacy path the same gap caused needless network round-trips (or spurious not-found errors) any time a node already had a validated subtree on disk.Fix
Add a
findLocalSubtreeFilehelper that consults both file types and reports which one (if any) holds the subtree. The same logic is added in two places — one per service — kept package-local to avoid a cross-service import:services/subtreevalidation/findLocalSubtreeFile(method onServer)services/blockvalidation/findLocalSubtreeFile(package function, new filefind_local_subtree.go)Callers are switched to use it and to read from whichever file type was found:
services/subtreevalidation/check_block_subtrees.goCheckBlockSubtreesservices/subtreevalidation/SubtreeValidation.gogetSubtreeTxHashesservices/blockvalidation/get_blocks.gofetchAndStoreSubtreeservices/blockvalidation/quick_validate.govalidateSubtrees/readSubtreeIn
quick_validate.gothe existing follow-upExists(FileTypeSubtree)is skipped when the lookup already returnedFileTypeSubtree, saving a store round-trip.Errors from the local-store check are now classified as
StorageErrorrather thanProcessingError, matching the rest of the read paths.Tests
services/subtreevalidation:TestFindLocalSubtreeFile(FileTypeSubtreeToCheck present,FileTypeSubtree only (quickValidationMode/legacy),neither present); plus targeted tests incheck_block_subtrees_test.goandinvalid_subtree_test.go.services/blockvalidation:TestFindLocalSubtreeFilemirror suite, plus aget_blocks_test.gocase covering the catch-up fast path under both file types.go test -race -short ./services/subtreevalidation/ ./services/blockvalidation/— green.go vetclean.Context
Originally bundled in #723; pulled out as a focused PR after #723's core change (broadening
quickValidationModetoCATCHINGBLOCKS) was superseded by #847. The scope was widened from "justCheckBlockSubtrees" to all four sites onceghand code-review surfaced the same anti-pattern ingetSubtreeTxHashes,fetchAndStoreSubtree, and the quick-validate retry path.Note on CI
The current red
testjob is a flake inTestSyncManager_extendFromTxMap_NilParentTx, a test added in #862 that lives inservices/legacy/netsync/— a package this PR does not touch. The fixture builds atxmap.NewSyncedMapwithlimit=2, inserts 2 entries, thenSets on an existing key;go-tx-mapevicts a random item on update once the limit is hit, so ~50% of runs the child is dropped and the SUT can't find it. Tracked separately; not in scope here.