Skip to content

fix: skip subtree_data fetch when local txs available#598

Merged
freemans13 merged 4 commits into
bsv-blockchain:mainfrom
freemans13:stu/skip-subtree-data-fetch
Mar 23, 2026
Merged

fix: skip subtree_data fetch when local txs available#598
freemans13 merged 4 commits into
bsv-blockchain:mainfrom
freemans13:stu/skip-subtree-data-fetch

Conversation

@freemans13

Copy link
Copy Markdown
Collaborator

Summary

  • When CheckBlockSubtrees finds missing subtrees during block validation, it previously always fetched the full subtree_data (all transaction bytes, e.g. 170MB for 880K txs) from the peer's asset-cache. The peer reconstructs this on the fly via heavy Aerospike reads, which can timeout under load (504s, MAX_RETRIES_EXCEEDED).
  • Now queries block assembly for local tx count first — when we have more txs locally than the block needs, skips the expensive peer fetch entirely. ValidateSubtreeInternal validates using its existing 3-tier fallback (cache → local Aerospike → network).
  • Adds blockAssemblyClient to subtreevalidation Server and wires it up in the daemon.

Test plan

  • go build ./... passes
  • go test -v -race -run TestCheckBlockSubtrees ./services/subtreevalidation/... — all existing tests pass
  • New test TestCheckBlockSubtrees_SkipSubtreeDataFetchWhenLocalTxsAvailable verifies the optimization path is taken
  • Deploy to dev-scale-2 and verify node stays in sync under load

🤖 Generated with Claude Code

When CheckBlockSubtrees finds missing subtrees, it fetches subtree_data
(all tx bytes, e.g. 170MB for 880K txs) from the peer's asset-cache.
This causes timeouts under load since the peer must reconstruct via
heavy Aerospike reads. Skip this fetch when block assembly already has
enough transactions locally — ValidateSubtreeInternal falls through to
local UTXO store lookups instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Summary:
This PR implements a performance optimization to skip expensive subtree_data fetches from peers when transactions are already available locally in block assembly. The implementation is well-structured with proper error handling, timeout protection, and comprehensive test coverage.

Key Changes Reviewed:

  • Block assembly client integration with graceful degradation
  • Optimization logic with 2-second timeout and >= comparison
  • Memory allocation moved inside conditional branches (performance improvement)
  • Test coverage for the optimization path with proper mocking
  • Documentation updated for new parameter

Issues Found: None

All previously reported issues from inline comments have been addressed by the author in subsequent commits (verified through comment threads). The code follows project conventions, includes proper error handling, and demonstrates defensive programming practices (nil checks, timeout handling, graceful degradation when blockassembly client unavailable).


History:

  • Previous inline comments from GitHub Actions and Copilot were addressed in commits ed654de and later
  • All concerns about >= comparison, timeout handling, pre-allocation, and test assertions have been resolved

var localTxsAvailable bool
if u.blockAssemblyClient != nil {
state, baErr := u.blockAssemblyClient.GetBlockAssemblyState(ctx)
if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison should use >= instead of >. When block assembly has exactly the same number of transactions as the block needs (e.g., both are 400), the optimization should still apply since we have all the transactions locally.

Suggested change
if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) {
if baErr == nil && state != nil && state.TxCount >= uint64(block.TransactionCount) {

This edge case is unlikely but would unnecessarily fetch from the peer when we already have all needed transactions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed654de — changed to >= so when TxCount equals TransactionCount we still skip the expensive peer fetch.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes subtree validation during block validation by avoiding expensive peer /subtree_data/ downloads when the node likely already has sufficient transactions locally (via block assembly state), reducing load on peers’ asset-cache/Aerospike and avoiding large transfers.

Changes:

  • Add an optional blockAssemblyClient dependency to subtreevalidation.Server and wire it up in the daemon.
  • In CheckBlockSubtrees, query block assembly tx count first and conditionally skip fetching peer subtree_data.
  • Update and extend tests to reflect the new constructor signature and to cover the intended optimization path.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
services/subtreevalidation/check_block_subtrees.go Adds block-assembly-based gating to skip peer subtree_data fetches.
services/subtreevalidation/Server.go Injects/stores blockAssemblyClient on the subtree validation server.
daemon/daemon_services.go Creates and passes a blockassembly client into subtreevalidation during daemon startup.
services/subtreevalidation/check_block_subtrees_test.go Adds a new test intended to verify the optimization path.
services/subtreevalidation/SubtreeValidation_test.go Updates New(...) calls for the new constructor signature.
services/subtreevalidation/Server_coverage_test.go Updates New(...) calls for coverage tests.
services/blockvalidation/BlockValidation_test.go Updates subtreevalidation New(...) call for signature change.
test/longtest/services/subtreevalidation/SubtreeValidation_test.go Updates longtest subtreevalidation New(...) call for signature change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +104 to +111
var localTxsAvailable bool
if u.blockAssemblyClient != nil {
state, baErr := u.blockAssemblyClient.GetBlockAssemblyState(ctx)
if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) {
localTxsAvailable = true
u.logger.Infof("[CheckBlockSubtrees] Block assembly has %d txs, block needs %d — skipping subtree_data fetch from peer", state.TxCount, block.TransactionCount)
}
}

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetBlockAssemblyState is called with the request context and no deadline. If the blockassembly service is unreachable or the connection isn’t ready, this RPC can block indefinitely (gRPC waits for readiness until ctx.Done()), potentially stalling block validation. Consider wrapping this call in a short context.WithTimeout (and optionally logging baErr at debug level) so subtree validation can fall back to the existing peer-fetch path promptly when blockassembly is unavailable.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — wrapped GetBlockAssemblyState in a 2s context.WithTimeout. If blockassembly is unreachable, the timeout fires and we fall through to the existing peer-fetch path. The error is silently ignored (baErr check already handles it).

var localTxsAvailable bool
if u.blockAssemblyClient != nil {
state, baErr := u.blockAssemblyClient.GetBlockAssemblyState(ctx)
if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) {

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization condition uses state.TxCount > uint64(block.TransactionCount). If blockassembly has exactly the number of transactions the block needs, the expensive peer subtree_data fetch will still run. If the intent is “enough txs locally”, this should likely be >= (and it may be worth clarifying whether TxCount includes the coinbase placeholder so the comparison is correct).

Suggested change
if baErr == nil && state != nil && state.TxCount > uint64(block.TransactionCount) {
if baErr == nil && state != nil && state.TxCount >= uint64(block.TransactionCount) {

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — same as the other comment, changed to >=.

Comment on lines +2056 to +2067
// Use a BaseUrl that would fail if actually contacted — proves the fetch is skipped.
// The subtree structure fetch will still happen (and fail), which is expected.
request := &subtreevalidation_api.CheckBlockSubtreesRequest{
Block: blockBytes,
BaseUrl: "http://192.0.2.1:1", // RFC 5737 TEST-NET — guaranteed unreachable
}

// This will error because the subtree structure (/subtree/) fetch still happens
// and the URL is unreachable. But the important thing is that GetBlockAssemblyState
// was called, proving the optimization path was taken.
_, _ = server.CheckBlockSubtrees(context.Background(), request)

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t actually verify that the /subtree_data/ fetch is skipped: it uses an unreachable BaseUrl, so the earlier /subtree/ structure fetch fails and the code never reaches the subtree_data branch regardless of localTxsAvailable. To validate the optimization, make the subtree structure available (e.g., store FileTypeSubtreeToCheck in subtreeStore or mock the /subtree/ endpoint) and then assert that the /subtree_data/ request is not made (e.g., via httpmock expectations).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked the test. Now stores FileTypeSubtreeToCheck in the store (bypasses /subtree/ HTTP fetch), uses httpmock to intercept HTTP calls, and verifies the error does NOT come from the CheckBlockSubtrees subtree_data fetch path. Note: ValidateSubtreeInternal has its own subtree_data fallback for missing txs — that is a separate code path unaffected by this optimization.

Comment on lines 168 to 181
func New(
ctx context.Context,
logger ulogger.Logger,
tSettings *settings.Settings,
subtreeStore blob.Store,
txStore blob.Store,
utxoStore utxo.Store,
validatorClient validator.Interface,
blockchainClient blockchain.ClientI,
subtreeConsumerClient kafka.KafkaConsumerGroupI,
txmetaConsumerClient kafka.KafkaConsumerGroupI,
p2pClient P2PClientI,
blockAssemblyClient blockassembly.ClientI,
) (*Server, error) {

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for New lists parameters only up to txmetaConsumerClient, but the function now takes additional dependencies (p2pClient and blockAssemblyClient). Please update the parameter documentation to match the signature so callers understand what to pass and what the dependency is used for.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added p2pClient and blockAssemblyClient to the parameter documentation in the New() doc comment.

- Change TxCount comparison from > to >= (equal count means all txs available)
- Wrap GetBlockAssemblyState in 2s context.WithTimeout to avoid blocking
- Update New() doc comment with p2pClient and blockAssemblyClient params
- Improve test to use httpmock and verify subtree_data fetch is actually skipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread daemon/daemon_services.go Outdated
Comment on lines +787 to +790
// Create blockassembly client to check local tx availability
blockAssemblyClient, err := blockassembly.NewClient(ctx, createLogger(loggerBlockAssembly), appSettings)
if err != nil {
return err

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startValidationService now hard-requires a blockassembly gRPC client for the subtree validation service. blockassembly.NewClient returns configuration/connection errors (e.g., when BlockAssembly.GRPCAddress is unset), so this change can prevent the SubtreeValidation service from starting even though the optimization is optional (the server already nil-checks u.blockAssemblyClient). Consider making this dependency best-effort here: if config is missing or the client can’t be created, log a warning and pass nil so subtree validation continues using the existing peer-fetch path.

Suggested change
// Create blockassembly client to check local tx availability
blockAssemblyClient, err := blockassembly.NewClient(ctx, createLogger(loggerBlockAssembly), appSettings)
if err != nil {
return err
// Create blockassembly client to check local tx availability. This is an optional
// optimization; if the client cannot be created, continue without it and fall back
// to the existing peer-fetch path in the SubtreeValidation service.
baLogger := createLogger(loggerBlockAssembly)
blockAssemblyClient, err := blockassembly.NewClient(ctx, baLogger, appSettings)
if err != nil {
baLogger.Warnf("failed to create blockassembly client for subtree validation; continuing without local tx availability optimization: %v", err)
blockAssemblyClient = nil

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — blockassembly client creation failure now logs a warning and passes nil. The subtreevalidation server already nil-checks blockAssemblyClient, so it falls through to the existing peer-fetch path.

Comment on lines +2092 to +2107
// Run CheckBlockSubtrees — will eventually fail during ValidateSubtreeInternal
// because we don't have the actual tx data, but the optimization path should be taken.
// ValidateSubtreeInternal may still fetch subtree_data as a fallback for missing txs,
// but the CheckBlockSubtrees fetch path should be skipped.
_, err = server.CheckBlockSubtrees(context.Background(), request)

// Verify block assembly state was queried — the optimization was triggered
mockBA.AssertCalled(t, "GetBlockAssemblyState", mock.Anything)

// If there's an error, it should NOT be "failed to get subtree data from" which
// comes from CheckBlockSubtrees' fetch path. Any subtree_data fetches should only
// come from ValidateSubtreeInternal's fallback path.
if err != nil {
assert.NotContains(t, err.Error(), "failed to get subtree data from",
"CheckBlockSubtrees should skip subtree_data fetch when local txs available")
}

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test doesn’t actually assert that the /subtree_data/ HTTP fetch is skipped by CheckBlockSubtrees: it only checks that the returned error string does not contain "failed to get subtree data from". If CheckBlockSubtrees still performs the fetch but then fails later (e.g., with "failed to process subtree data stream"), this test would still pass.

To make the assertion robust, explicitly assert the HTTP call count for the /subtree_data/ endpoint is zero. Since ValidateSubtreeInternal can also fetch /subtree_data/ as a fallback, consider forcing that path off in this test (e.g., set server.settings.SubtreeValidation.PercentageMissingGetFullData > 100) so any /subtree_data/ request can only come from CheckBlockSubtrees.

Suggested change
// Run CheckBlockSubtrees — will eventually fail during ValidateSubtreeInternal
// because we don't have the actual tx data, but the optimization path should be taken.
// ValidateSubtreeInternal may still fetch subtree_data as a fallback for missing txs,
// but the CheckBlockSubtrees fetch path should be skipped.
_, err = server.CheckBlockSubtrees(context.Background(), request)
// Verify block assembly state was queried — the optimization was triggered
mockBA.AssertCalled(t, "GetBlockAssemblyState", mock.Anything)
// If there's an error, it should NOT be "failed to get subtree data from" which
// comes from CheckBlockSubtrees' fetch path. Any subtree_data fetches should only
// come from ValidateSubtreeInternal's fallback path.
if err != nil {
assert.NotContains(t, err.Error(), "failed to get subtree data from",
"CheckBlockSubtrees should skip subtree_data fetch when local txs available")
}
// Disable ValidateSubtreeInternal's fallback subtree_data fetch path so that any
// requests to /subtree_data/ can only originate from CheckBlockSubtrees itself.
server.settings.SubtreeValidation.PercentageMissingGetFullData = 101
// Run CheckBlockSubtrees — will eventually fail during ValidateSubtreeInternal
// because we don't have the actual tx data, but the optimization path should be taken.
_, err = server.CheckBlockSubtrees(context.Background(), request)
// Verify block assembly state was queried — the optimization was triggered.
mockBA.AssertCalled(t, "GetBlockAssemblyState", mock.Anything)
// If there's an error, it should NOT be "failed to get subtree data from" which
// comes from CheckBlockSubtrees' fetch path.
if err != nil {
assert.NotContains(t, err.Error(), "failed to get subtree data from",
"CheckBlockSubtrees should skip subtree_data fetch when local txs available")
}
// Assert that no subtree_data fetch was performed by CheckBlockSubtrees.
callCounts := httpmock.GetCallCountInfo()
subtreeKey := "GET http://test-peer.local/subtree_data/"
assert.Zero(t, callCounts[subtreeKey],
"CheckBlockSubtrees should not fetch /subtree_data/ when local txs are available")

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strengthened the assertion. Now sets PercentageMissingGetFullData=101 to disable ValidateSubtreeInternal's fallback subtree_data fetch, then iterates httpmock call counts to verify no subtree_data endpoint was contacted. Any subtree_data call at that point can only come from CheckBlockSubtrees itself.

- blockassembly client creation failure now logs warning and passes nil
  instead of preventing subtreevalidation from starting
- Test now disables ValidateSubtreeInternal's fallback fetch and asserts
  zero subtree_data HTTP calls via httpmock call counts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 266 to 270
if !subtreeDataExists {
// get the subtree data from the peer and process it directly
url := fmt.Sprintf("%s/subtree_data/%s", request.BaseUrl, subtreeHash.String())
if !localTxsAvailable {
// get the subtree data from the peer and process it directly
url := fmt.Sprintf("%s/subtree_data/%s", request.BaseUrl, subtreeHash.String())

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When localTxsAvailable is true, this branch skips populating subtreeTxs[subtreeIdx], but the code already preallocates make([]*bt.Tx, 0, subtreeToCheck.Length()) unconditionally just above. That backing array can be large and becomes wasted allocation on the skip path, reducing the benefit of avoiding subtree_data downloads. Consider moving the preallocation into the !localTxsAvailable/subtreeDataExists paths (only when you will actually append txs) and leaving the slice nil/empty-cap when skipping.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — moved the pre-allocation into the two paths that actually populate the slice (peer fetch and stored file extraction). When localTxsAvailable skips the fetch, no backing array is allocated.

Move make([]*bt.Tx, 0, subtreeToCheck.Length()) into the fetch and
extract branches so the backing array isn't allocated when
localTxsAvailable skips the fetch entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sonarqubecloud

Copy link
Copy Markdown

@freemans13 freemans13 merged commit 88379d4 into bsv-blockchain:main Mar 23, 2026
15 checks passed
galt-tr pushed a commit to galt-tr/teranode that referenced this pull request Mar 24, 2026
…#598)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
freemans13 added a commit to freemans13/teranode that referenced this pull request Mar 30, 2026
freemans13 added a commit to freemans13/teranode that referenced this pull request Apr 17, 2026
…SM cascade

Forces FSM into CATCHINGBLOCKS while a fresh ReceivedAt stamp exists and
asserts the liveness gate still skips subtreeData. This is the test that,
had it existed, would have prevented PR bsv-blockchain#598 from being reverted (PR bsv-blockchain#647).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
freemans13 added a commit to freemans13/teranode that referenced this pull request Jun 5, 2026
…ockchain#745 review

Pin the adaptive-fetch gate pessimistic until the node first reaches FSM
RUNNING, so cold-start IBD / restart-while-behind never skips subtreeData at
the moment txs are least likely to be local. New() now always starts
pessimistic; a one-way, idempotent Arm() latch (tripped by each service on the
first RUNNING) applies the configured bootstrap mode and clears the pre-arm
window so fake-perfect IBD samples can't cause an instant flip. The latch never
re-locks, so a post-RUNNING catch-up burst may still use optimistic mode.

pkg/adaptivefetch stays FSM-free: Arm is a generic trigger and the FSM wait
(WaitForFSMtoTransitionToGivenState RUNNING) lives in the service layer, so
TestNoWallClockOrFSMDependency still holds and the bsv-blockchain#598/bsv-blockchain#647 clock/FSM-gating
class-of-bug is not reintroduced into the algorithm.

Also addresses the review:
- Add TestCheckBlockSubtrees_Optimistic_SkipsFetchSubtreeData covering the
  previously-untested optimistic skip branch in subtreevalidation (runs both
  modes so it cannot pass trivially).
- Extract (*State).RecordSyntheticWarmup, collapsing the two duplicated
  ~34-line observation-recording blocks (SonarQube duplication).
- Log the previously-swallowed fallback adaptivefetch.New error in both
  blockvalidation and subtreevalidation Server constructors.
- Honest docs/TODO for the synthetic MissingFetches=0 one-way limitation;
  fix "four collectors" -> two comment drift; update settings longdesc to
  describe the arm-on-RUNNING behaviour.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants