fix: skip subtree_data fetch when local txs available#598
Conversation
When CheckBlockSubtrees finds missing subtrees, it fetches subtree_data (all tx bytes, e.g. 170MB for 880K txs) from the peer's asset-cache. This causes timeouts under load since the peer must reconstruct via heavy Aerospike reads. Skip this fetch when block assembly already has enough transactions locally — ValidateSubtreeInternal falls through to local UTXO store lookups instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🤖 Claude Code Review Status: Complete Summary: Key Changes Reviewed:
Issues Found: None All previously reported issues from inline comments have been addressed by the author in subsequent commits (verified through comment threads). The code follows project conventions, includes proper error handling, and demonstrates defensive programming practices (nil checks, timeout handling, graceful degradation when blockassembly client unavailable). History:
|
| var localTxsAvailable bool | ||
| if u.blockAssemblyClient != nil { | ||
| state, baErr := u.blockAssemblyClient.GetBlockAssemblyState(ctx) | ||
| if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) { |
There was a problem hiding this comment.
The comparison should use >= instead of >. When block assembly has exactly the same number of transactions as the block needs (e.g., both are 400), the optimization should still apply since we have all the transactions locally.
| if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) { | |
| if baErr == nil && state != nil && state.TxCount >= uint64(block.TransactionCount) { |
This edge case is unlikely but would unnecessarily fetch from the peer when we already have all needed transactions.
There was a problem hiding this comment.
Fixed in ed654de — changed to >= so when TxCount equals TransactionCount we still skip the expensive peer fetch.
There was a problem hiding this comment.
Pull request overview
This PR optimizes subtree validation during block validation by avoiding expensive peer /subtree_data/ downloads when the node likely already has sufficient transactions locally (via block assembly state), reducing load on peers’ asset-cache/Aerospike and avoiding large transfers.
Changes:
- Add an optional
blockAssemblyClientdependency tosubtreevalidation.Serverand wire it up in the daemon. - In
CheckBlockSubtrees, query block assembly tx count first and conditionally skip fetching peersubtree_data. - Update and extend tests to reflect the new constructor signature and to cover the intended optimization path.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
services/subtreevalidation/check_block_subtrees.go |
Adds block-assembly-based gating to skip peer subtree_data fetches. |
services/subtreevalidation/Server.go |
Injects/stores blockAssemblyClient on the subtree validation server. |
daemon/daemon_services.go |
Creates and passes a blockassembly client into subtreevalidation during daemon startup. |
services/subtreevalidation/check_block_subtrees_test.go |
Adds a new test intended to verify the optimization path. |
services/subtreevalidation/SubtreeValidation_test.go |
Updates New(...) calls for the new constructor signature. |
services/subtreevalidation/Server_coverage_test.go |
Updates New(...) calls for coverage tests. |
services/blockvalidation/BlockValidation_test.go |
Updates subtreevalidation New(...) call for signature change. |
test/longtest/services/subtreevalidation/SubtreeValidation_test.go |
Updates longtest subtreevalidation New(...) call for signature change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var localTxsAvailable bool | ||
| if u.blockAssemblyClient != nil { | ||
| state, baErr := u.blockAssemblyClient.GetBlockAssemblyState(ctx) | ||
| if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) { | ||
| localTxsAvailable = true | ||
| u.logger.Infof("[CheckBlockSubtrees] Block assembly has %d txs, block needs %d — skipping subtree_data fetch from peer", state.TxCount, block.TransactionCount) | ||
| } | ||
| } |
There was a problem hiding this comment.
GetBlockAssemblyState is called with the request context and no deadline. If the blockassembly service is unreachable or the connection isn’t ready, this RPC can block indefinitely (gRPC waits for readiness until ctx.Done()), potentially stalling block validation. Consider wrapping this call in a short context.WithTimeout (and optionally logging baErr at debug level) so subtree validation can fall back to the existing peer-fetch path promptly when blockassembly is unavailable.
There was a problem hiding this comment.
Fixed — wrapped GetBlockAssemblyState in a 2s context.WithTimeout. If blockassembly is unreachable, the timeout fires and we fall through to the existing peer-fetch path. The error is silently ignored (baErr check already handles it).
| var localTxsAvailable bool | ||
| if u.blockAssemblyClient != nil { | ||
| state, baErr := u.blockAssemblyClient.GetBlockAssemblyState(ctx) | ||
| if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) { |
There was a problem hiding this comment.
The optimization condition uses state.TxCount > uint64(block.TransactionCount). If blockassembly has exactly the number of transactions the block needs, the expensive peer subtree_data fetch will still run. If the intent is “enough txs locally”, this should likely be >= (and it may be worth clarifying whether TxCount includes the coinbase placeholder so the comparison is correct).
| if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) { | |
| if baErr == nil && state != nil && state.TxCount >= uint64(block.TransactionCount) { |
There was a problem hiding this comment.
Fixed — same as the other comment, changed to >=.
| // Use a BaseUrl that would fail if actually contacted — proves the fetch is skipped. | ||
| // The subtree structure fetch will still happen (and fail), which is expected. | ||
| request := &subtreevalidation_api.CheckBlockSubtreesRequest{ | ||
| Block: blockBytes, | ||
| BaseUrl: "http://192.0.2.1:1", // RFC 5737 TEST-NET — guaranteed unreachable | ||
| } | ||
|
|
||
| // This will error because the subtree structure (/subtree/) fetch still happens | ||
| // and the URL is unreachable. But the important thing is that GetBlockAssemblyState | ||
| // was called, proving the optimization path was taken. | ||
| _, _ = server.CheckBlockSubtrees(context.Background(), request) | ||
|
|
There was a problem hiding this comment.
This test doesn’t actually verify that the /subtree_data/ fetch is skipped: it uses an unreachable BaseUrl, so the earlier /subtree/ structure fetch fails and the code never reaches the subtree_data branch regardless of localTxsAvailable. To validate the optimization, make the subtree structure available (e.g., store FileTypeSubtreeToCheck in subtreeStore or mock the /subtree/ endpoint) and then assert that the /subtree_data/ request is not made (e.g., via httpmock expectations).
There was a problem hiding this comment.
Reworked the test. Now stores FileTypeSubtreeToCheck in the store (bypasses /subtree/ HTTP fetch), uses httpmock to intercept HTTP calls, and verifies the error does NOT come from the CheckBlockSubtrees subtree_data fetch path. Note: ValidateSubtreeInternal has its own subtree_data fallback for missing txs — that is a separate code path unaffected by this optimization.
| func New( | ||
| ctx context.Context, | ||
| logger ulogger.Logger, | ||
| tSettings *settings.Settings, | ||
| subtreeStore blob.Store, | ||
| txStore blob.Store, | ||
| utxoStore utxo.Store, | ||
| validatorClient validator.Interface, | ||
| blockchainClient blockchain.ClientI, | ||
| subtreeConsumerClient kafka.KafkaConsumerGroupI, | ||
| txmetaConsumerClient kafka.KafkaConsumerGroupI, | ||
| p2pClient P2PClientI, | ||
| blockAssemblyClient blockassembly.ClientI, | ||
| ) (*Server, error) { |
There was a problem hiding this comment.
The doc comment for New lists parameters only up to txmetaConsumerClient, but the function now takes additional dependencies (p2pClient and blockAssemblyClient). Please update the parameter documentation to match the signature so callers understand what to pass and what the dependency is used for.
There was a problem hiding this comment.
Fixed — added p2pClient and blockAssemblyClient to the parameter documentation in the New() doc comment.
- Change TxCount comparison from > to >= (equal count means all txs available) - Wrap GetBlockAssemblyState in 2s context.WithTimeout to avoid blocking - Update New() doc comment with p2pClient and blockAssemblyClient params - Improve test to use httpmock and verify subtree_data fetch is actually skipped Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Create blockassembly client to check local tx availability | ||
| blockAssemblyClient, err := blockassembly.NewClient(ctx, createLogger(loggerBlockAssembly), appSettings) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
startValidationService now hard-requires a blockassembly gRPC client for the subtree validation service. blockassembly.NewClient returns configuration/connection errors (e.g., when BlockAssembly.GRPCAddress is unset), so this change can prevent the SubtreeValidation service from starting even though the optimization is optional (the server already nil-checks u.blockAssemblyClient). Consider making this dependency best-effort here: if config is missing or the client can’t be created, log a warning and pass nil so subtree validation continues using the existing peer-fetch path.
| // Create blockassembly client to check local tx availability | |
| blockAssemblyClient, err := blockassembly.NewClient(ctx, createLogger(loggerBlockAssembly), appSettings) | |
| if err != nil { | |
| return err | |
| // Create blockassembly client to check local tx availability. This is an optional | |
| // optimization; if the client cannot be created, continue without it and fall back | |
| // to the existing peer-fetch path in the SubtreeValidation service. | |
| baLogger := createLogger(loggerBlockAssembly) | |
| blockAssemblyClient, err := blockassembly.NewClient(ctx, baLogger, appSettings) | |
| if err != nil { | |
| baLogger.Warnf("failed to create blockassembly client for subtree validation; continuing without local tx availability optimization: %v", err) | |
| blockAssemblyClient = nil |
There was a problem hiding this comment.
Fixed — blockassembly client creation failure now logs a warning and passes nil. The subtreevalidation server already nil-checks blockAssemblyClient, so it falls through to the existing peer-fetch path.
| // Run CheckBlockSubtrees — will eventually fail during ValidateSubtreeInternal | ||
| // because we don't have the actual tx data, but the optimization path should be taken. | ||
| // ValidateSubtreeInternal may still fetch subtree_data as a fallback for missing txs, | ||
| // but the CheckBlockSubtrees fetch path should be skipped. | ||
| _, err = server.CheckBlockSubtrees(context.Background(), request) | ||
|
|
||
| // Verify block assembly state was queried — the optimization was triggered | ||
| mockBA.AssertCalled(t, "GetBlockAssemblyState", mock.Anything) | ||
|
|
||
| // If there's an error, it should NOT be "failed to get subtree data from" which | ||
| // comes from CheckBlockSubtrees' fetch path. Any subtree_data fetches should only | ||
| // come from ValidateSubtreeInternal's fallback path. | ||
| if err != nil { | ||
| assert.NotContains(t, err.Error(), "failed to get subtree data from", | ||
| "CheckBlockSubtrees should skip subtree_data fetch when local txs available") | ||
| } |
There was a problem hiding this comment.
This new test doesn’t actually assert that the /subtree_data/ HTTP fetch is skipped by CheckBlockSubtrees: it only checks that the returned error string does not contain "failed to get subtree data from". If CheckBlockSubtrees still performs the fetch but then fails later (e.g., with "failed to process subtree data stream"), this test would still pass.
To make the assertion robust, explicitly assert the HTTP call count for the /subtree_data/ endpoint is zero. Since ValidateSubtreeInternal can also fetch /subtree_data/ as a fallback, consider forcing that path off in this test (e.g., set server.settings.SubtreeValidation.PercentageMissingGetFullData > 100) so any /subtree_data/ request can only come from CheckBlockSubtrees.
| // Run CheckBlockSubtrees — will eventually fail during ValidateSubtreeInternal | |
| // because we don't have the actual tx data, but the optimization path should be taken. | |
| // ValidateSubtreeInternal may still fetch subtree_data as a fallback for missing txs, | |
| // but the CheckBlockSubtrees fetch path should be skipped. | |
| _, err = server.CheckBlockSubtrees(context.Background(), request) | |
| // Verify block assembly state was queried — the optimization was triggered | |
| mockBA.AssertCalled(t, "GetBlockAssemblyState", mock.Anything) | |
| // If there's an error, it should NOT be "failed to get subtree data from" which | |
| // comes from CheckBlockSubtrees' fetch path. Any subtree_data fetches should only | |
| // come from ValidateSubtreeInternal's fallback path. | |
| if err != nil { | |
| assert.NotContains(t, err.Error(), "failed to get subtree data from", | |
| "CheckBlockSubtrees should skip subtree_data fetch when local txs available") | |
| } | |
| // Disable ValidateSubtreeInternal's fallback subtree_data fetch path so that any | |
| // requests to /subtree_data/ can only originate from CheckBlockSubtrees itself. | |
| server.settings.SubtreeValidation.PercentageMissingGetFullData = 101 | |
| // Run CheckBlockSubtrees — will eventually fail during ValidateSubtreeInternal | |
| // because we don't have the actual tx data, but the optimization path should be taken. | |
| _, err = server.CheckBlockSubtrees(context.Background(), request) | |
| // Verify block assembly state was queried — the optimization was triggered. | |
| mockBA.AssertCalled(t, "GetBlockAssemblyState", mock.Anything) | |
| // If there's an error, it should NOT be "failed to get subtree data from" which | |
| // comes from CheckBlockSubtrees' fetch path. | |
| if err != nil { | |
| assert.NotContains(t, err.Error(), "failed to get subtree data from", | |
| "CheckBlockSubtrees should skip subtree_data fetch when local txs available") | |
| } | |
| // Assert that no subtree_data fetch was performed by CheckBlockSubtrees. | |
| callCounts := httpmock.GetCallCountInfo() | |
| subtreeKey := "GET http://test-peer.local/subtree_data/" | |
| assert.Zero(t, callCounts[subtreeKey], | |
| "CheckBlockSubtrees should not fetch /subtree_data/ when local txs are available") |
There was a problem hiding this comment.
Strengthened the assertion. Now sets PercentageMissingGetFullData=101 to disable ValidateSubtreeInternal's fallback subtree_data fetch, then iterates httpmock call counts to verify no subtree_data endpoint was contacted. Any subtree_data call at that point can only come from CheckBlockSubtrees itself.
- blockassembly client creation failure now logs warning and passes nil instead of preventing subtreevalidation from starting - Test now disables ValidateSubtreeInternal's fallback fetch and asserts zero subtree_data HTTP calls via httpmock call counts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !subtreeDataExists { | ||
| // get the subtree data from the peer and process it directly | ||
| url := fmt.Sprintf("%s/subtree_data/%s", request.BaseUrl, subtreeHash.String()) | ||
| if !localTxsAvailable { | ||
| // get the subtree data from the peer and process it directly | ||
| url := fmt.Sprintf("%s/subtree_data/%s", request.BaseUrl, subtreeHash.String()) | ||
|
|
There was a problem hiding this comment.
When localTxsAvailable is true, this branch skips populating subtreeTxs[subtreeIdx], but the code already preallocates make([]*bt.Tx, 0, subtreeToCheck.Length()) unconditionally just above. That backing array can be large and becomes wasted allocation on the skip path, reducing the benefit of avoiding subtree_data downloads. Consider moving the preallocation into the !localTxsAvailable/subtreeDataExists paths (only when you will actually append txs) and leaving the slice nil/empty-cap when skipping.
There was a problem hiding this comment.
Fixed — moved the pre-allocation into the two paths that actually populate the slice (peer fetch and stored file extraction). When localTxsAvailable skips the fetch, no backing array is allocated.
Move make([]*bt.Tx, 0, subtreeToCheck.Length()) into the fetch and extract branches so the backing array isn't allocated when localTxsAvailable skips the fetch entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
…#598) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ockchain#598)" This reverts commit 88379d4.
…SM cascade Forces FSM into CATCHINGBLOCKS while a fresh ReceivedAt stamp exists and asserts the liveness gate still skips subtreeData. This is the test that, had it existed, would have prevented PR bsv-blockchain#598 from being reverted (PR bsv-blockchain#647). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ockchain#745 review Pin the adaptive-fetch gate pessimistic until the node first reaches FSM RUNNING, so cold-start IBD / restart-while-behind never skips subtreeData at the moment txs are least likely to be local. New() now always starts pessimistic; a one-way, idempotent Arm() latch (tripped by each service on the first RUNNING) applies the configured bootstrap mode and clears the pre-arm window so fake-perfect IBD samples can't cause an instant flip. The latch never re-locks, so a post-RUNNING catch-up burst may still use optimistic mode. pkg/adaptivefetch stays FSM-free: Arm is a generic trigger and the FSM wait (WaitForFSMtoTransitionToGivenState RUNNING) lives in the service layer, so TestNoWallClockOrFSMDependency still holds and the bsv-blockchain#598/bsv-blockchain#647 clock/FSM-gating class-of-bug is not reintroduced into the algorithm. Also addresses the review: - Add TestCheckBlockSubtrees_Optimistic_SkipsFetchSubtreeData covering the previously-untested optimistic skip branch in subtreevalidation (runs both modes so it cannot pass trivially). - Extract (*State).RecordSyntheticWarmup, collapsing the two duplicated ~34-line observation-recording blocks (SonarQube duplication). - Log the previously-swallowed fallback adaptivefetch.New error in both blockvalidation and subtreevalidation Server constructors. - Honest docs/TODO for the synthetic MissingFetches=0 one-way limitation; fix "four collectors" -> two comment drift; update settings longdesc to describe the arm-on-RUNNING behaviour. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>



Summary
CheckBlockSubtreesfinds missing subtrees during block validation, it previously always fetched the fullsubtree_data(all transaction bytes, e.g. 170MB for 880K txs) from the peer's asset-cache. The peer reconstructs this on the fly via heavy Aerospike reads, which can timeout under load (504s,MAX_RETRIES_EXCEEDED).ValidateSubtreeInternalvalidates using its existing 3-tier fallback (cache → local Aerospike → network).blockAssemblyClientto subtreevalidation Server and wires it up in the daemon.Test plan
go build ./...passesgo test -v -race -run TestCheckBlockSubtrees ./services/subtreevalidation/...— all existing tests passTestCheckBlockSubtrees_SkipSubtreeDataFetchWhenLocalTxsAvailableverifies the optimization path is taken🤖 Generated with Claude Code