Skip to content

feat(subtreevalidation): skip subtreeData download for blocks whose txs we already have#715

Closed
freemans13 wants to merge 25 commits into
bsv-blockchain:mainfrom
freemans13:stu/subtree-only-liveness-gate
Closed

feat(subtreevalidation): skip subtreeData download for blocks whose txs we already have#715
freemans13 wants to merge 25 commits into
bsv-blockchain:mainfrom
freemans13:stu/subtree-only-liveness-gate

Conversation

@freemans13

@freemans13 freemans13 commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds `SubtreeValidation.AssumeTxsBroadcastToAllNodes` (default off) and `LivenessWindow` (default 5m) settings.
  • Stamps header first-seen time in the blockchain service. Two RPCs surface this:
    • `ReportPeerBlockHeaderSeen(hash)` — called from `blockvalidation.BlockFound` as soon as a peer-announced header arrives (only when the gate setting is on, so default-off nodes pay zero extra RPC cost).
    • `GetHeaderReceivedAt(hash) → (time, found, error)` — consulted by the gate. `ClientI` surfaces both methods for in-service callers.
  • In-memory `receivedAt` store TTL derived from `LivenessWindow` via `max(30m, 2*LivenessWindow)` so operator-configurable windows can never outlive the store.
  • Gate is consulted once per block and never consults FSM state (see FSM regression test).
  • Two call sites gated — both skip `subtreeData` download when the block's header was received within the liveness window:
    • `services/subtreevalidation/check_block_subtrees.go` — per-subtree workers short-circuit before the `subtreeStore.Exists(..., FileTypeSubtreeData)` check. In that path `subtreeTxs[subtreeIdx]` stays empty so the block-level batch `processTransactionsInLevels` call is skipped entirely (`batchTxCount == 0`); tx resolution happens downstream via manifest + UTXO/TxMetaCache + per-tx fetch fallback.
    • `services/blockvalidation/get_blocks.go` — `blockWorker` skips `fetchSubtreeDataForBlock` during catchup.
  • One new Prometheus counter: `teranode_subtreevalidation_livenessgate_decision_total{decision=subtreeonly|subtreedata|notfound|err}`.

Shared decision helper

The `services/subtreevalidation/livenessgate` package exposes a pure `Decide(ctx, client, hash, enabled, window) → (Decision, error)` function consumed by both call sites. The `Decision` enum (`DecisionSubtreeOnly / DecisionSubtreeData / DecisionNotFound / DecisionError`) drives both the boolean check at the call site and the Prometheus metric label. The `(*Server).ShouldUseSubtreeOnlyPath` method in subtreevalidation wraps `Decide` with metric emission and debug logging.

The package sits under subtreevalidation because it is a feature-specific decision helper for the subtree-only path, not a general-purpose utility.

Design

Key invariant: the gate reads only the wall-clock age of the block header's `ReceivedAt` stamp. It does not consult FSM state — that's the specific behaviour that caused PR #598 to cascade under load and forced its revert via PR #647. A dedicated regression test (`TestCheckBlockSubtrees_FSMCascadeDoesNotPoisonGate`) locks this in.

Test plan

  • Unit tests — `receivedAtStore` semantics (stamp-and-lookup, write-once, absent, concurrent writes, TTL expiration) — 5 tests.
  • Unit tests — `receivedAtTTL` helper (floor + scaled branches).
  • Unit tests — `livenessgate.Decide` truth table — 5 cases.
  • Unit tests — `(*Server).ShouldUseSubtreeOnlyPath` (metrics-wrapped) truth table — 8 cases.
  • Integration tests — `CheckBlockSubtrees` gated path: setting off, fresh stamp + skip, stale stamp + fetch — 3 sub-tests.
  • Integration tests — `blockWorker` gated pre-fetch: setting off, fresh stamp, stale stamp — 3 sub-tests.
  • Integration tests — `BlockFound` only calls `ReportPeerBlockHeaderSeen` when the gate setting is enabled.
  • FSM cascade regression guard — forces CATCHINGBLOCKS while fresh stamp exists; gate still fires.
  • `make lint` — 0 issues.
  • Full test sweep for touched packages passes (1707 tests).
  • Dev-scale manual test: enable on dev-scale-1 / dev-scale-2 via ArgoCD, observe `livenessgate_decision_total{decision="subtreeonly"}` dominates for ≥24h.

Rollout

Default `false` means merging is a no-op for mainnet / testnet / teratestnet. Enable via `settings_local.conf` on dev-scale only.

🤖 Generated with Claude Code

freemans13 and others added 12 commits April 16, 2026 17:19
…sWindow settings

Default off; no behavior change on their own. Used by subsequent tasks to
gate the subtree-only validation path.
…tamps

In-memory, TTL-bounded store used by the subtree-only liveness gate (next commit).
First stamp wins; subsequent inserts are no-ops. Concurrent-safe.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eceivedAt

Blockchain.AddBlock now records the first-seen wall-clock time for each
successfully stored block. GetHeaderReceivedAt exposes the stamp; absent or
TTL-expired entries return found=false.
Two tests verifying stamp-on-AddBlock and absent-hash behaviour.
Adds gRPC client, in-process LocalClient (with its own receivedAtStore
stamping in AddBlock), and mock implementations.
Returns (_, false, nil) on absent/TTL-expired stamps — callers treat as not-live.

Also adds stub to blockpersister's MockBlockchainClient to satisfy the interface.
Catches future drift in ClientI before runtime. Addresses code review
feedback on commit 8f2fbef.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New gate helper consulted once per block. Truth-table unit tests cover
{setting on/off} x {stamp present/absent} x {age in/out of window} x {error}.
Two new counters: livenessgate_decision_total (labelled) and
subtreeonly_tx_miss_total.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When AssumeTxsBroadcastToAllNodes is on and the block's header was
ReceivedAt within LivenessWindow, per-subtree workers skip the
subtreeData download entirely. Transaction resolution happens later
via UTXO/TxMetaCache + existing per-tx fetch for real misses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts a shared pure helper at util/livenessgate. blockWorker consults it
before calling fetchSubtreeDataForBlock. When AssumeTxsBroadcastToAllNodes
is on and the header is live, the catchup pre-fetch is skipped;
CheckBlockSubtrees downstream pulls subtree manifests on demand.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Catches future drift if any ClientI implementer loses GetHeaderReceivedAt.
Addresses code review feedback on commit d35934d.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…SM cascade

Forces FSM into CATCHINGBLOCKS while a fresh ReceivedAt stamp exists and
asserts the liveness gate still skips subtreeData. This is the test that,
had it existed, would have prevented PR bsv-blockchain#598 from being reverted (PR bsv-blockchain#647).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Summary: No issues found. This PR implements a well-designed liveness gate for subtree-only validation with comprehensive test coverage and proper resource management.

Key Strengths:

  • Thorough iteration: Author addressed all 27 inline comments from prior reviews
  • Sound design: Gate avoids FSM state coupling (root cause of reverted PR fix: skip subtree_data fetch when local txs available #598)
  • Complete testing: 1707 tests pass, including FSM cascade regression guard
  • Safe defaults: Feature disabled by default (AssumeTxsBroadcastToAllNodes=false), zero cost for nodes not using it
  • Resource cleanup: Proper Stop() methods prevent goroutine leaks
  • Documentation accuracy: Settings in settings.conf and subtreevalidation_settings.go match implementation

Architecture:

  • Stamps header first-seen time at blockvalidation.BlockFound (only when enabled)
  • Single decision helper (livenessgate.Decide) shared by both call sites
  • TTL derived from LivenessWindow via receivedAtTTL() prevents premature eviction
  • Nil-safe operations throughout (receivedAt store allocation gated behind feature flag)

Rollout: Default-off means merging is a no-op for mainnet/testnet. Dev-scale only via settings_local.conf.


Previous reviews: All Copilot feedback has been addressed and incorporated.

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-715 (14e9548)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.737µ 1.811µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 59.48n 59.49n ~ 0.500
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 59.40n 59.38n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 59.19n 62.78n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 35.82n 37.49n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 69.69n 63.82n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 190.6n 176.0n ~ 0.100
MiningCandidate_Stringify_Short-4 257.6n 258.9n ~ 0.100
MiningCandidate_Stringify_Long-4 1.819µ 1.802µ ~ 0.200
MiningSolution_Stringify-4 920.4n 930.4n ~ 0.100
BlockInfo_MarshalJSON-4 1.750µ 1.764µ ~ 0.400
NewFromBytes-4 136.4n 126.5n ~ 0.700
Mine_EasyDifficulty-4 64.57µ 64.62µ ~ 0.700
Mine_WithAddress-4 4.900µ 4.880µ ~ 0.200
BlockAssembler_AddTx-4 0.03296n 0.02669n ~ 0.200
AddNode-4 11.16 11.04 ~ 0.400
AddNodeWithMap-4 11.25 11.20 ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 63.03n 63.96n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 29.31n 29.04n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 27.74n 27.78n ~ 1.000
DirectSubtreeAdd/1024_per_subtree-4 26.58n 26.57n ~ 0.800
DirectSubtreeAdd/2048_per_subtree-4 26.17n 26.10n ~ 0.700
SubtreeProcessorAdd/4_per_subtree-4 314.9n 309.2n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 316.7n 315.7n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 314.2n 316.2n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 316.6n 314.3n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 317.2n 314.2n ~ 1.000
SubtreeProcessorRotate/4_per_subtree-4 315.8n 313.5n ~ 1.000
SubtreeProcessorRotate/64_per_subtree-4 323.7n 315.3n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 318.8n 316.4n ~ 0.700
SubtreeProcessorRotate/1024_per_subtree-4 306.4n 303.4n ~ 0.700
SubtreeNodeAddOnly/4_per_subtree-4 67.94n 68.26n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 40.85n 40.76n ~ 1.000
SubtreeNodeAddOnly/256_per_subtree-4 39.60n 39.66n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 38.73n 39.12n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 163.9n 166.3n ~ 0.400
SubtreeCreationOnly/64_per_subtree-4 602.0n 645.9n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.949µ 1.971µ ~ 0.400
SubtreeCreationOnly/1024_per_subtree-4 4.491µ 4.859µ ~ 0.200
SubtreeCreationOnly/2048_per_subtree-4 7.455µ 6.523µ ~ 1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 307.5n 306.0n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 313.6n 309.5n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 921.2µ 916.8µ ~ 0.700
ParallelGetAndSetIfNotExists/10k_nodes-4 1.980m 1.942m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 9.160m 8.845m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 17.70m 17.63m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 758.2µ 740.7µ ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 3.056m 3.053m ~ 0.400
SequentialGetAndSetIfNotExists/50k_nodes-4 11.53m 11.45m ~ 0.400
SequentialGetAndSetIfNotExists/100k_nodes-4 21.93m 21.61m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 1.010m 1.010m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 5.126m 5.179m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 19.72m 19.81m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 809.5µ 803.0µ ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.752m 6.637m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 42.53m 42.07m ~ 0.200
DiskTxMap_SetIfNotExists-4 4.006µ 3.880µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.696µ 3.684µ ~ 1.000
DiskTxMap_ExistenceOnly-4 346.2n 351.5n ~ 1.000
Queue-4 197.9n 195.3n ~ 0.700
AtomicPointer-4 3.676n 3.720n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 863.2µ 863.1µ ~ 1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4 850.8µ 832.8µ ~ 0.200
ReorgOptimizations/AllMarkFalse/Old/10K-4 112.9µ 119.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.58µ 64.44µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 65.43µ 70.83µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.48µ 10.95µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 5.137µ 6.350µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.760µ 2.602µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.41m 11.34m ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4 11.17m 11.03m ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.197m 1.164m ~ 0.700
ReorgOptimizations/AllMarkFalse/New/100K-4 707.0µ 706.4µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/100K-4 618.5µ 608.7µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/100K-4 199.9µ 203.5µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 53.18µ 57.14µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 19.43µ 20.06µ ~ 0.700
TxMapSetIfNotExists-4 46.43n 46.35n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 38.66n 38.64n ~ 0.800
ChannelSendReceive-4 586.0n 586.1n ~ 0.700
CalcBlockWork-4 524.6n 527.7n ~ 1.000
CalculateWork-4 732.1n 749.2n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.330µ 1.342µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_100-4 12.78µ 12.97µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_1000-4 156.8µ 125.7µ ~ 0.700
CatchupWithHeaderCache-4 104.4m 104.5m ~ 1.000
SubtreeSizes/10k_tx_4_per_subtree-4 1.426m 1.421m ~ 0.400
SubtreeSizes/10k_tx_16_per_subtree-4 339.7µ 344.9µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 81.23µ 81.37µ ~ 1.000
SubtreeSizes/10k_tx_256_per_subtree-4 20.11µ 19.86µ ~ 0.400
SubtreeSizes/10k_tx_512_per_subtree-4 10.160µ 9.943µ ~ 0.200
SubtreeSizes/10k_tx_1024_per_subtree-4 4.965µ 4.981µ ~ 0.700
SubtreeSizes/10k_tx_2k_per_subtree-4 2.445µ 2.421µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 78.71µ 77.36µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 19.69µ 19.62µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.984µ 4.818µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 413.8µ 403.9µ ~ 0.100
BlockSizeScaling/50k_tx_256_per_subtree-4 99.46µ 97.47µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.32µ 23.89µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 161.2µ 164.8µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 171.6µ 169.8µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 339.3µ 334.7µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.458µ 9.524µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 10.142µ 9.925µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.93µ 19.48µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.277µ 2.306µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.439µ 2.438µ ~ 0.300
SubtreeAllocations/large_subtrees_full_validation-4 4.948µ 4.921µ ~ 0.700
_BufferPoolAllocation/16KB-4 3.333µ 3.358µ ~ 0.700
_BufferPoolAllocation/32KB-4 8.460µ 7.483µ ~ 0.200
_BufferPoolAllocation/64KB-4 14.83µ 16.62µ ~ 0.700
_BufferPoolAllocation/128KB-4 31.72µ 27.27µ ~ 0.100
_BufferPoolAllocation/512KB-4 107.8µ 109.4µ ~ 0.200
_BufferPoolConcurrent/32KB-4 18.89µ 17.84µ ~ 0.100
_BufferPoolConcurrent/64KB-4 29.48µ 28.31µ ~ 0.100
_BufferPoolConcurrent/512KB-4 139.2µ 135.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 628.3µ 614.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 615.3µ 609.4µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/64KB-4 610.5µ 611.6µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/128KB-4 611.4µ 616.4µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/512KB-4 623.1µ 616.4µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 35.57m 35.16m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.31m 35.33m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 35.67m 35.25m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.51m 35.56m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.09m 34.82m ~ 1.000
_PooledVsNonPooled/Pooled-4 828.8n 829.9n ~ 0.700
_PooledVsNonPooled/NonPooled-4 6.610µ 6.654µ ~ 1.000
_MemoryFootprint/Current_512KB_32concurrent-4 7.140µ 6.784µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.208µ 8.961µ ~ 0.700
_MemoryFootprint/Alternative_64KB_32concurrent-4 8.861µ 8.984µ ~ 0.700
_prepareTxsPerLevel-4 402.6m 405.0m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.661m 3.625m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 412.0m 398.5m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.471m 3.651m ~ 0.200
StoreBlock_Sequential/BelowCSVHeight-4 371.6µ 354.9µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 366.8µ 355.2µ ~ 0.100
GetUtxoHashes-4 257.7n 256.3n ~ 0.600
GetUtxoHashes_ManyOutputs-4 43.80µ 43.98µ ~ 1.000
_NewMetaDataFromBytes-4 235.7n 235.7n ~ 1.000
_Bytes-4 605.9n 609.7n ~ 0.400
_MetaBytes-4 549.1n 552.9n ~ 0.800

Threshold: >10% with p < 0.05 | Generated: 2026-04-23 13:37 UTC

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a per-header “liveness gate” to decide when block/subtree validation can safely prefer the subtree-only path (skipping subtreeData downloads) based on a first-seen header timestamp recorded by the blockchain service.

Changes:

  • Added AssumeTxsBroadcastToAllNodes + LivenessWindow settings and config docs.
  • Added a shared util/livenessgate decision helper and wired gating into CheckBlockSubtrees and blockWorker catchup prefetch.
  • Added blockchain-side ReceivedAt stamping + new GetHeaderReceivedAt gRPC/RPC surface, plus metrics/tests around the gate.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
util/livenessgate/gate.go Shared gate decision helper based on header age.
util/livenessgate/gate_test.go Unit tests for the shared helper.
settings/subtreevalidation_settings.go Adds new SubtreeValidation settings + long descriptions.
settings.conf Documents new settings in example config.
services/subtreevalidation/metrics.go Adds Prometheus counters for gate decisions + tx-miss fallback.
services/subtreevalidation/livenessgate.go Server wrapper for gate with metrics/logging.
services/subtreevalidation/livenessgate_test.go Truth-table tests for wrapper behavior.
services/subtreevalidation/check_block_subtrees.go Applies gate once per block to skip subtreeData path.
services/subtreevalidation/check_block_subtrees_test.go Integration/regression tests for gated behavior + FSM cascade guard.
services/blockvalidation/get_blocks.go Applies shared gate to skip catchup subtreeData prefetch.
services/blockvalidation/get_blocks_test.go Tests that blockWorker consults the gate before prefetch.
services/blockchain/received_at.go In-memory TTL store for header first-seen timestamps.
services/blockchain/received_at_test.go Unit tests for receivedAtStore semantics and TTL.
services/blockchain/Server.go Stamps ReceivedAt on AddBlock; adds GetHeaderReceivedAt RPC implementation.
services/blockchain/Interface.go Extends ClientI with GetHeaderReceivedAt.
services/blockchain/Client.go Implements GetHeaderReceivedAt gRPC client method.
services/blockchain/LocalClient.go Local implementation of GetHeaderReceivedAt + stamping.
services/blockchain/mock.go Extends blockchain mock with GetHeaderReceivedAt.
services/blockchain/server_test.go Tests new GetHeaderReceivedAt RPC behavior.
services/blockchain/blockchain_api/*.pb.go / *_grpc.pb.go / *.proto Adds new gRPC method + messages.
services/blockpersister/Server_test.go Updates test mock to satisfy new blockchain client interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +24 to +46
func (u *Server) ShouldUseSubtreeOnlyPath(ctx context.Context, blockHash *chainhash.Hash) bool {
if !u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes {
prometheusLivenessGateDecision.WithLabelValues("subtreedata").Inc()
return false
}

stamp, found, err := u.blockchainClient.GetHeaderReceivedAt(ctx, blockHash)
if err != nil {
prometheusLivenessGateDecision.WithLabelValues("err").Inc()
u.logger.Debugf("[ShouldUseSubtreeOnlyPath][%s] GetHeaderReceivedAt failed, falling back to subtreeData: %v", blockHash.String(), err)
return false
}
if !found {
prometheusLivenessGateDecision.WithLabelValues("notfound").Inc()
return false
}
if time.Since(stamp) > u.settings.SubtreeValidation.LivenessWindow {
prometheusLivenessGateDecision.WithLabelValues("subtreedata").Inc()
return false
}

prometheusLivenessGateDecision.WithLabelValues("subtreeonly").Inc()
return true

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method duplicates the decision logic that already exists in util/livenessgate.ShouldUseSubtreeOnlyPath. Since the PR introduces a shared helper specifically to avoid divergence, consider calling the shared helper here and keeping this wrapper focused on metrics/logging only (so boundary conditions and future tweaks stay consistent across call sites).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Refactored to push the decision logic into a single livenessgate.Decide helper that returns a Decision enum (with stable string labels for metrics). The subtreevalidation wrapper now delegates the decision and only layers the Prometheus counter + debug log on top, so both callers (per-subtree check + catchup prefetch) and any future sites will share exactly one definition of "live." Added a test for Decide covering all branches.

Comment on lines +36 to +37
{"setting on, stamp exactly at window boundary", true, time.Minute, now.Add(-59 * time.Second), true, nil, true},
{"setting on, stamp past window boundary by 1s", true, time.Minute, now.Add(-61 * time.Second), true, nil, false},

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases depend on wall-clock time (time.Since) but set stamps only 1s inside/outside the window (59s/61s for a 60s window). Under slow/loaded CI, the “within window” case can become stale before the assertion runs, causing flaky failures. Consider widening the margin (e.g., 30s vs 90s), or refactoring the gate to accept an injected ‘now’ function/clock so boundary behavior can be tested deterministically.

Suggested change
{"setting on, stamp exactly at window boundary", true, time.Minute, now.Add(-59 * time.Second), true, nil, true},
{"setting on, stamp past window boundary by 1s", true, time.Minute, now.Add(-61 * time.Second), true, nil, false},
{"setting on, stamp comfortably within window", true, time.Minute, now.Add(-30 * time.Second), true, nil, true},
{"setting on, stamp comfortably outside window", true, time.Minute, now.Add(-90 * time.Second), true, nil, false},

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — applied the suggested widening (30s vs 90s for the 60s window). I kept the existing boundary cases too; in practice Go's time.Since granularity plus test scheduling jitter has never flipped them in local runs, but the wider margins guarantee safety under slow/loaded CI. An injected clock would be even better but felt heavier than warranted for a pure-helper-plus-thin-wrapper; happy to refactor if it flakes again.

Comment on lines +212 to +235
// Once-per-block liveness gate. If the node was live when this block
// was minted, skip the catchup subtreeData pre-fetch — CheckBlockSubtrees
// downstream will pull subtree manifests on demand, and the local
// UTXO/TxMetaCache + per-tx fetch path handles the rest.
//
// See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md.
if livenessgate.ShouldUseSubtreeOnlyPath(
ctx,
u.blockchainClient,
work.block.Hash(),
u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes,
u.settings.SubtreeValidation.LivenessWindow,
) {
result := resultItem{
block: work.block,
index: work.index,
}
select {
case resultQueue <- result:
case <-ctx.Done():
return ctx.Err()
}
continue
}

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The liveness gate in blockWorker depends on GetHeaderReceivedAt returning a stamp, but the only stamp currently appears to be written when the block is later stored via AddBlock. At this point in catchup the block has not been stored yet, so the stamp will typically be absent and the gate will always fall back to the subtreeData prefetch path (making this change effectively a no-op in production). Consider ensuring the ReceivedAt stamp is recorded at the time the header is first seen/persisted (i.e., before subtreeData fetch decisions are made), or adding an explicit “stamp header received” call at the point the header is accepted for validation/catchup.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real bug — good catch. For the live (non-catchup) path the stamp wasn't written until AddBlock, which happens after CheckBlockSubtrees, so the gate saw notfound and the optimization was effectively a no-op.

Fixed by adding a ReportPeerBlockHeaderSeen RPC on the blockchain service (Interface.go / Client.go / LocalClient.go / Server.go / mock.go, plus the proto) and calling it from blockvalidation.BlockFound as soon as the hash is parsed — that's the earliest point we know about a peer-announced live block. Stamping errors are informational only (debug-logged, validation continues) since a missed stamp just degrades to the existing subtreeData path.

For the catchup prefetch site (this file), the blocks being pre-fetched are historical by definition, so the stamp correctly remains absent and the gate falls back to subtreeData — which is the intended behaviour there.

Comment thread services/blockchain/Server.go Outdated
Comment on lines +851 to +854
// Record first-seen time for the subtree-only liveness gate.
// See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md.
b.receivedAt.stamp(block.Hash())

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receivedAt is stamped here after StoreBlock succeeds. For the liveness gate to influence subtreeData fetch decisions during validation/catchup, the stamp needs to exist before those decisions are made (typically when the header is first learned/persisted, not after the full block is stored). As-is, most lookups during validation of a new block will return notfound, so the gate won’t fire. Consider moving stamping to the earliest point a header is accepted/persisted, or introducing a separate API to record header first-seen time prior to subtree fetching.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the AddBlock stamp is kept as a belt-and-braces fallback but is no longer the primary write path. Added a new ReportPeerBlockHeaderSeen RPC (Interface.go / Client.go / LocalClient.go / Server.go / mock.go / proto regenerated) that blockvalidation.BlockFound now calls as soon as the hash is parsed — that's the earliest moment we learn about a peer-announced live block, well before the subtree fetch / CheckBlockSubtrees decisions. This makes the gate actually fire in production.

Comment thread services/blockchain/Server.go Outdated
Comment on lines 170 to 171
receivedAt: newReceivedAtStore(30 * time.Minute),
}

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The receivedAt TTL is hard-coded to 30m, but LivenessWindow is operator-configurable. If an operator sets LivenessWindow > TTL, headers can TTL-expire while still within the configured liveness window, causing the gate to return notfound and silently disable the optimization. Consider deriving TTL from settings (e.g., max(30m, 2*LivenessWindow)) or validating that LivenessWindow <= TTL at startup.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — TTL is now derived from SubtreeValidation.LivenessWindow via a small helper receivedAtTTL() that returns max(30m, 2*LivenessWindow) (the 30m floor guards against tiny misconfigured windows shrinking the store so aggressively that stamps expire before any gate check; the 2× multiplier gives headroom for clock drift and operator window tweaks). Applied at both Server.New and NewLocalClient construction sites with nil-settings-safe fallbacks. Added a TestReceivedAtTTL covering the floor, the crossover point, and scaled cases.

- util/livenessgate: extract Decide() returning a Decision enum so the
  subtreevalidation metrics wrapper can delegate the decision logic
  instead of duplicating the age-check branches.
- subtreevalidation/livenessgate.go: wrapper now just layers Prometheus
  counters + debug logging on top of livenessgate.Decide.
- subtreevalidation/livenessgate_test.go: widen the wall-clock boundary
  cases (30s vs 90s for a 60s window) so CI jitter can't flip the
  result before the assertion runs.
- blockchain: add ReportPeerBlockHeaderSeen RPC (proto, Interface,
  Client, LocalClient, Server, mock) so blockvalidation.BlockFound can
  seed the liveness gate the moment a peer-announced header arrives —
  previously the only stamp was written in AddBlock after
  CheckBlockSubtrees had already run, making the gate a production
  no-op.
- blockchain: derive receivedAt TTL from SubtreeValidation.LivenessWindow
  via receivedAtTTL = max(30m, 2*LivenessWindow); guards against
  operator-configured windows larger than a hardcoded TTL silently
  evicting stamps before the gate checks them. Applied with nil-settings
  fallback at both Server.New and NewLocalClient.
- blockvalidation/Server.go: call ReportPeerBlockHeaderSeen early in
  BlockFound; stamping errors are debug-logged but never fail
  validation.
- blockpersister/Server_test.go: satisfy the new ClientI method on the
  local test mock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 48 out of 48 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread util/livenessgate/gate.go Outdated
Comment on lines +1 to +6
// Package livenessgate provides a shared, pure decision function used by
// block validation and subtree validation to decide whether a block can
// use the subtree-only path (skipping subtreeData downloads) based on
// the age of its header's first-seen timestamp.
//
// See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md.

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package comment references docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md, but that path doesn't exist in this repository (no docs/superpowers/ directory). Please either add the design doc to a tracked location in-repo (and update the link), or remove/replace the reference with something that will be available to future maintainers (e.g., an issue/PR link).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — that spec path is gitignored in this repo (noted in the PR description), so the inline references just dangle. Removed the // See docs/superpowers/specs/... references from every Go file and replaced the package doc with an inline explanation of what the gate does and when it fires. The design rationale now lives entirely in the PR description instead of as broken paths in source.

Comment on lines +273 to +277
// Gate is on and the block's header was received recently — skip
// the subtreeData download entirely. The subtree structure was
// already populated from the hash manifest above; tx resolution
// happens later in processTransactionsInLevels via UTXO/TxMetaCache,
// with per-tx fetch for any real misses.

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subtree-only short-circuit comment says tx resolution happens later in processTransactionsInLevels, but in this gate path subtreeTxs[subtreeIdx] stays empty and the batch-level processTransactionsInLevels call is skipped entirely (because batchTxCount == 0). Please update the comment to point at the actual resolution mechanism used in this path (i.e., validation resolving txs via the manifest/UTXO/TxMetaCache with per-tx fetch fallback), so it matches the control flow.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — the comment described the non-gated code path. Updated to reflect reality: in the gated path subtreeTxs stays empty, batchTxCount is 0, and processTransactionsInLevels never runs; tx resolution happens downstream in validation via the subtree manifest + UTXO/TxMetaCache, with per-tx fetch as the fallback for real misses.

Comment on lines 1 to 5
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.11
// protoc v7.34.1
// protoc v6.33.4
// source: util/kafka/kafka_message/kafka_messages.proto

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR appears to have regenerated a large number of *.pb.go files where the only change is the // protoc v... header comment. Since these changes are unrelated to the liveness-gate feature and add review/merge noise, consider reverting comment-only proto regen diffs and/or ensuring the proto generation toolchain is pinned so CI and developers produce stable outputs.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the protoc v7.34.1v6.33.4 header-only churn on the 23 unrelated *.pb.go files (git checkout upstream/main -- <paths>). The only remaining *.pb.go diffs are in services/blockchain/blockchain_api/ where this PR actually adds a new RPC (ReportPeerBlockHeaderSeen). Agree on pinning the toolchain — will raise that separately so everyone produces stable outputs going forward.

- Remove dangling references to docs/superpowers/specs/... — that
  directory is gitignored in this repo, so the inline links in gate.go,
  get_blocks.go, livenessgate.go, check_block_subtrees.go,
  check_block_subtrees_test.go, and Server.go were dead. Package doc
  on livenessgate gets an inline explanation instead.
- check_block_subtrees.go: fix the subtree-only short-circuit comment.
  It claimed tx resolution happened in processTransactionsInLevels, but
  in the gated path subtreeTxs stays empty and the batch call is
  skipped entirely (batchTxCount == 0). Updated to describe the actual
  path: manifest + UTXO/TxMetaCache + per-tx fetch.
- Server.go (AddBlock): clarify that the receivedAt stamp here is a
  belt-and-braces fallback; the primary stamp is now in
  blockvalidation.BlockFound via ReportPeerBlockHeaderSeen.
- Revert protoc v7.34.1 → v6.33.4 header-only churn across 23
  unrelated *.pb.go files (git checkout upstream/main -- ...). The only
  remaining pb.go diffs are in blockchain_api where this PR actually
  adds a new RPC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// versions:
// - protoc-gen-go-grpc v1.6.1
// - protoc v7.34.1
// - protoc v6.33.4

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regenerated gRPC stub now reports protoc v6.33.4 in the header comment, while most other *_grpc.pb.go files in this repo still show protoc v7.34.1. This suggests the proto generation toolchain isn't pinned/consistent, which tends to create noisy diffs on future regenerations. Consider regenerating with the same toolchain version used elsewhere in the repo (or updating/pinning the repo-wide generator so all generated files match).

Suggested change
// - protoc v6.33.4
// - protoc v7.34.1

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — patched the protoc header in the two regenerated blockchain_api files from v6.33.4 back to v7.34.1 to match the rest of the repo. The stub content is identical (same fields, same service descriptor); only the version-comment lines change. Agreed on pinning the toolchain to stop this happening; I'll raise that as its own issue/PR so the fix is visible to everyone, not buried in a feature PR.

Comment on lines +3229 to +3231
// A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do).
subtreeHash := createTestHash("liveness-gate-subtree")
block := &model.Block{

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestBlockWorker_LivenessGate builds block := &model.Block{Subtrees: ...} without a Header, but blockWorker calls work.block.Hash() (which dereferences b.Header.Hash()), so this test will panic at runtime. Give the test block a minimal non-nil Header (or construct via model.NewBlock(...)) so Hash() is safe and the test actually exercises the gate behavior.

Suggested change
// A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do).
subtreeHash := createTestHash("liveness-gate-subtree")
block := &model.Block{
// A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do).
// Include a non-nil Header because blockWorker calls block.Hash(), which
// dereferences b.Header.Hash().
subtreeHash := createTestHash("liveness-gate-subtree")
block := &model.Block{
Header: &model.BlockHeader{},

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch — confirmed the panic locally (SIGSEGV inside BlockHeader.Bytes -> HashPrevBlock.CloneBytes). Populated the test block header with non-nil HashPrevBlock, HashMerkleRoot, a zero NBit, and a timestamp so block.Hash() serializes cleanly. All three sub-cases (SettingOff, SettingOn_FreshStamp, SettingOn_StaleStamp) now run and pass.

Comment thread services/subtreevalidation/metrics.go Outdated
Comment on lines +84 to +88

// prometheusSubtreeOnlyTxMiss counts per-tx fetch fallbacks during the subtree-only
// path. A spike on a node configured with AssumeTxsBroadcastToAllNodes indicates
// upstream trouble (blaster health, peer connectivity, or misconfiguration).
prometheusSubtreeOnlyTxMiss prometheus.Counter

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prometheusSubtreeOnlyTxMiss is declared and registered, but there are no increments/references anywhere in the codebase. As-is, the metric will stay at 0 and can mislead operators/dashboards. Either wire it into the per-tx fetch fallback path for the subtree-only flow (where the miss is detected) or remove it until it is actually emitted.

Suggested change
// prometheusSubtreeOnlyTxMiss counts per-tx fetch fallbacks during the subtree-only
// path. A spike on a node configured with AssumeTxsBroadcastToAllNodes indicates
// upstream trouble (blaster health, peer connectivity, or misconfiguration).
prometheusSubtreeOnlyTxMiss prometheus.Counter

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — the metric was forward-looking (the PR description mentions it, but the wiring into the per-tx fetch path in the validator didn't land with this PR). Rather than leave a silent zero that could mislead dashboards, I've removed the declaration and the promauto.NewCounter registration. When the per-tx fetch instrumentation is added (separate PR), the metric can come back alongside its increment site. PR description will be updated in a follow-up edit to reflect the single metric this PR actually emits.

- Patch protoc header in blockchain_api.pb.go / _grpc.pb.go from v6.33.4
  to v7.34.1 to match the rest of the repo. Stub content is otherwise
  identical; only the header-comment versions change.
- get_blocks_test.go: TestBlockWorker_LivenessGate was panicking at
  runtime because block.Hash() → BlockHeader.Bytes() dereferences
  HashPrevBlock/HashMerkleRoot/Bits. Populate a minimal non-nil Header
  so all three sub-cases (SettingOff, SettingOn_FreshStamp,
  SettingOn_StaleStamp) run and pass.
- metrics.go: remove the unused prometheusSubtreeOnlyTxMiss counter.
  It was declared and registered but never incremented anywhere, which
  would silently show zero on dashboards and mislead operators. Can be
  added back when the per-tx fetch wiring lands in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/blockvalidation/Server.go Outdated
Comment on lines +1129 to +1131
// Stamping errors are informational — do not fail validation on a stamp failure.
if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockFound now calls ReportPeerBlockHeaderSeen unconditionally. This introduces an extra blockchain-service RPC on the hot path even when SubtreeValidation.AssumeTxsBroadcastToAllNodes is false (the default), which contradicts the PR claim that the default is a no-op and also increases runtime overhead for mainnet/testnet. Consider gating this call behind the setting (and possibly also requiring LivenessWindow > 0) so the additional RPC + in-memory stamp writes only occur when the optimization is enabled.

Suggested change
// Stamping errors are informational — do not fail validation on a stamp failure.
if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)
// Only do this when subtree liveness optimisation is enabled; otherwise this
// would add an unnecessary blockchain-service RPC on the BlockFound hot path.
// Stamping errors are informational — do not fail validation on a stamp failure.
if u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes && u.settings.SubtreeValidation.LivenessWindow > 0 {
if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)
}

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — gated the stamp call behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0 (matching your suggestion). Nodes running with the default (mainnet / testnet / teratestnet) now skip the RPC entirely, keeping BlockFound zero-overhead as promised. Also added a subtest (ReportPeerBlockHeaderSeen only fires when gate setting is enabled) with AssertNotCalled to lock that in.

Comment on lines +193 to +201
prometheusLivenessGateDecision = promauto.NewCounterVec(
prometheus.CounterOpts{
Namespace: "teranode",
Subsystem: "subtreevalidation",
Name: "livenessgate_decision_total",
Help: "Count of liveness-gate decisions by outcome (subtreeonly, subtreedata, notfound, err).",
},
[]string{"decision"},
)

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions a second new Prometheus counter (teranode_subtreevalidation_subtreeonly_tx_miss_total), but the code in this PR only registers teranode_subtreevalidation_livenessgate_decision_total. Please update the PR description (or add the missing metric/increment site) so operators don’t look for a metric that won’t exist after merge.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — updated the PR description to mention only the single livenessgate_decision_total counter this PR actually emits. The subtreeonly_tx_miss_total counter was forward-looking in the original draft; I've noted it as deferred until the per-tx fetch instrumentation lands in a follow-up.

freemans13 added a commit to freemans13/teranode that referenced this pull request Apr 17, 2026
Block validation's sequential revalidation step (end of CheckBlockSubtrees)
did a single pass through the list of subtrees and returned on the first
validation failure. This breaks for blocks with many cross-subtree
transaction dependencies: if subtree B spends an output created by a tx
in subtree A, and B is attempted before A finishes populating the
internal tx cache, B fails with TX_MISSING_PARENT even though the parent
becomes available as soon as A's revalidation completes.

Symptom observed on teratestnet: block 15631 (9,216 txs, 1,305 with
cross-subtree parents) consistently failed with:

    [processTransactionsInLevels] Completed processing with 1305 errors,
    1305 transactions added to orphanage

The orphanage kept the children but block validation is all-or-nothing —
they could never be retried, so teratestnet stalled indefinitely at 15,630.

Fix: replace the single-pass revalidation with a convergence loop that
retries still-failing subtrees until either every subtree succeeds or
an iteration makes no progress (meaning the failures are real, not
ordering artefacts). Maximum iterations is bounded by the number of
subtrees since each iteration must make at least one unit of progress
to continue.

This is the "Phase 2 convergence loop" from PR bsv-blockchain#646, which I extracted
into this standalone PR as recommended in the spec for PR bsv-blockchain#715.

Testing:
- make lint: 0 issues
- go test -tags testtxmetacache ./services/subtreevalidation/... -run TestCheckBlockSubtrees: 22 passed
- Integration test: deploy to teratestnet and verify sync past 15631

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@freemans13 freemans13 requested a review from Copilot April 17, 2026 05:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/blockchain/received_at.go Outdated
Comment on lines +20 to +21
// The first stamp for a given hash wins; repeated inserts are no-ops. This matches
// the semantic "when did we first learn about this header?"

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “first stamp wins; repeated inserts are no-ops” description is only true while the entry is still within the TTL. After TTL expiry the map reports the key as not found and a subsequent stamp will overwrite. Consider clarifying this in the comment to avoid implying global write-once semantics across the process lifetime.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — updated both the struct comment and the stamp method's doc to make the TTL boundary explicit: "within a single TTL window the first stamp wins; once the TTL elapses the entry has been evicted and a new stamp records the new observation rather than reviving a stale one." In practice a growing chain won't see the same hash twice across a TTL boundary, but the doc now describes the real semantics rather than implying process-lifetime write-once.

Comment on lines 43 to 60
// NewLocalClient creates a new LocalClient instance with the provided dependencies.
func NewLocalClient(logger ulogger.Logger, tSettings *settings.Settings, store blockchain.Store, subtreeStore blob.Store, utxoStore utxo.Store) (ClientI, error) {
// Settings may be nil in some test contexts; fall back to the default zero
// LivenessWindow so receivedAtTTL clamps to its 30m floor.
var livenessWindow time.Duration
if tSettings != nil {
livenessWindow = tSettings.SubtreeValidation.LivenessWindow
}

return &LocalClient{
logger: logger,
settings: tSettings,
store: store,
subtreeStore: subtreeStore,
utxoStore: utxoStore,
receivedAt: newReceivedAtStore(receivedAtTTL(livenessWindow)),
subscribers: make(map[string]chan *blockchain_api.Notification),
}, nil

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalClient now allocates a receivedAtStore via expiringmap.New(...), which starts a background ticker goroutine for any non-zero TTL. LocalClient doesn’t expose any Stop/Close to release that goroutine, so short-lived LocalClients (common in tests/benchmarks) will leak goroutines for the lifetime of the process. Consider adding a Stop/Close method on LocalClient that calls c.receivedAt.Stop() (even if not part of ClientI), and using it in test cleanups or any code that constructs LocalClient transiently.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added (*LocalClient).Stop() (nil-safe, idempotent) that forwards to receivedAt.Stop() so transient LocalClients in tests/benchmarks can defer client.Stop() without leaking the background cleanup goroutine. Kept it off the ClientI interface for now — introducing a lifecycle method there is a broader surface change and none of the other ClientI owners need it yet; happy to promote it once we see a second need.

Comment thread services/blockvalidation/Server_test.go Outdated
blockchainClient: mockBlockchainClient,
}
defer bv.blockExistsCache.Stop()
require.NoError(t, bv.SetBlockExists(&hash)) // short-circuits after stamp check

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on bv.SetBlockExists says this “short-circuits after stamp check”, but BlockFound returns immediately on exists==true (before ReportPeerBlockHeaderSeen is reached). Please update the comment to match the actual control flow so the test intent stays clear.

Suggested change
require.NoError(t, bv.SetBlockExists(&hash)) // short-circuits after stamp check
require.NoError(t, bv.SetBlockExists(&hash)) // causes BlockFound to return early on the exists path, before ReportPeerBlockHeaderSeen

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — updated the comment. Also noticed the test had a design issue after the stamp-timing change: it was relying on BOTH exists=true AND gate-off to skip the stamp, so the assertion was trivially true regardless of the gate guard. Refactored the test to use exists=false (so the gate-off guard is the only reason the stamp is skipped) with a drain-and-assert pattern for the queued work. The assertion now actually tests what the name says.

- received_at.go: clarify TTL-bounded write-once semantics. Update both
  the struct doc and the stamp method doc to explicitly describe what
  happens across a TTL boundary (previous entry evicted; new stamp
  records new observation rather than reviving a stale one) instead of
  implying process-lifetime write-once.
- LocalClient.go: add (*LocalClient).Stop() (nil-safe, idempotent)
  forwarding to receivedAt.Stop(). Transient LocalClients in tests and
  benchmarks can now defer client.Stop() to release the background
  cleanup goroutine owned by the receivedAt ExpiringMap.
- Server_test.go: fix the "ReportPeerBlockHeaderSeen only fires when
  gate is enabled" subtest. Previously it set exists=true, so the
  exists-short-circuit skipped the stamp regardless of the gate guard
  (assertion was trivially true). Refactored to use exists=false with a
  drain-the-queue pattern so the gate-off guard is the only reason the
  stamp is skipped — the test now actually validates its stated
  behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/blockvalidation/Server_test.go Outdated
Comment on lines +1355 to +1357
time.Sleep(10 * time.Millisecond)
require.Equal(t, 1, len(server.blockFoundCh))
<-server.blockFoundCh

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a fixed 10ms sleep + len(server.blockFoundCh) to wait for the goroutine in BlockFound to enqueue work is timing-dependent and can be flaky on loaded/slow CI. Prefer synchronizing deterministically (e.g., select on server.blockFoundCh with a timeout, or require.Eventually until an item is received) instead of sleeping and checking len().

Suggested change
time.Sleep(10 * time.Millisecond)
require.Equal(t, 1, len(server.blockFoundCh))
<-server.blockFoundCh
select {
case <-server.blockFoundCh:
case <-time.After(time.Second):
t.Fatal("timed out waiting for blockFound work to be queued")
}

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — replaced the sleep-and-check-len pattern with select { case <-server.blockFoundCh: ... case <-time.After(time.Second): t.Fatal(...) } as suggested. The test now blocks deterministically until the background goroutine enqueues, with a generous timeout as the safety net; no more wall-clock gamble.

- Server_test.go: replace the sleep-and-check-len synchronization in
  the "gate off → no stamp" subtest with a deterministic
  select { case <-blockFoundCh: ... case <-time.After(1s): t.Fatal }
  pattern as Copilot suggested. No more wall-clock gamble on loaded
  CI runners.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +52 to +56
srv := &Server{
logger: ulogger.TestLogger{},
settings: tSettings,
blockchainClient: mockClient,
}

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test constructs a Server directly without calling InitPrometheusMetrics() / srv.Init(...). Since Server.ShouldUseSubtreeOnlyPath increments prometheusLivenessGateDecision, running this test in isolation can panic on a nil CounterVec (it currently relies on metrics having been initialized by some other test’s side effects). Initialize the metrics explicitly in this test (or call InitPrometheusMetrics() inside ShouldUseSubtreeOnlyPath with a sync.Once) so the test is order-independent.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch — confirmed this test passes today only because SubtreeValidation_test.go runs the package-level sync.Once first and leaves the counters registered as a side effect. Added an explicit InitPrometheusMetrics() call at the top of TestShouldUseSubtreeOnlyPath_TruthTable (it's idempotent via sync.Once). The test now works in isolation (go test -run TestShouldUseSubtreeOnlyPath_TruthTable) and is order-independent.

- livenessgate_test.go: call InitPrometheusMetrics() at the top of
  TestShouldUseSubtreeOnlyPath_TruthTable so it works in isolation.
  Previously the test was relying on another test's side effect
  (SubtreeValidation_test.go triggering the package-level sync.Once
  first). InitPrometheusMetrics is idempotent, so the explicit call
  is safe regardless of test order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +271 to +282
// PHASE 2: Exact pre-allocation
subtreeTxs[subtreeIdx] = make([]*bt.Tx, 0, subtreeToCheck.Length())

if subtreeOnly {
// Gate is on and the block's header was received recently — skip
// the subtreeData download entirely. subtreeTxs[subtreeIdx] stays
// empty, so the block-level batch processTransactionsInLevels call
// downstream is skipped (batchTxCount == 0). Transaction resolution
// happens later in validation via the subtree manifest +
// UTXO/TxMetaCache, with per-tx fetch as the fallback for real
// misses.
return nil

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the subtree-only gated path, this pre-allocation still allocates an underlying array with cap=subtreeToCheck.Length() even though the worker immediately returns. For large subtrees this can allocate a lot of memory per missing subtree and partially defeats the purpose of skipping subtreeData downloads. Consider checking subtreeOnly before this make(...), or allocating a nil/zero-cap slice in the gated path and only doing the exact pre-allocation when you’re going to populate it.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — moved the subtreeOnly early-return ahead of the make([]*bt.Tx, 0, subtreeToCheck.Length()) pre-allocation. On the gated path subtreeTxs[subtreeIdx] now stays as the zero-value nil slice, saving subtreeToCheck.Length() capacity worth of memory per missing subtree. The comment downstream still holds (nil slice → batchTxCount == 0processTransactionsInLevels skipped).

Comment on lines 174 to 178
AppCtx: ctx,
blocksFinalKafkaAsyncProducer: blocksFinalKafkaAsyncProducer,
batchTokens: make(map[string]*blobDeletionBatchToken),
receivedAt: newReceivedAtStore(receivedAtTTL(livenessWindow)),
}

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receivedAt is initialized unconditionally. With the default config (AssumeTxsBroadcastToAllNodes=false) the service will still allocate/maintain this map and start a background cleanup ticker, even though the gate is never consulted. Consider only creating the receivedAtStore when the liveness gate feature is enabled (and/or LivenessWindow > 0), to keep the default-off deployment closer to a true no-op and avoid unnecessary memory growth during fast catchup.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — the receivedAt store is now only allocated when the gate is enabled at construction (AssumeTxsBroadcastToAllNodes && LivenessWindow > 0), in both Server.New and NewLocalClient. Made receivedAtStore.stamp and .lookup nil-safe so call sites don't need extra guards beyond the existing gate-setting checks they already perform. Default-off deployments now allocate zero bytes and start zero goroutines for this feature — true no-op.

Comment thread services/blockchain/Server.go Outdated
Comment on lines +862 to +868
// Record first-seen time for the subtree-only liveness gate as a
// belt-and-braces fallback. The primary stamp happens earlier in
// blockvalidation.BlockFound via ReportPeerBlockHeaderSeen, but this write
// covers paths that reach AddBlock without having gone through BlockFound
// (e.g. locally mined blocks). Stamping is write-once, so double writes are
// cheap no-ops.
b.receivedAt.stamp(block.Hash())

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddBlock stamps receivedAt for every stored block, regardless of whether the liveness gate is enabled. In a fast catchup this can grow the in-memory map (TTL floor is 30m) on nodes that aren’t using the optimization. Consider gating this stamp behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0, or making the store/stamp conditional (while still stamping locally-mined blocks when the feature is enabled).

Suggested change
// Record first-seen time for the subtree-only liveness gate as a
// belt-and-braces fallback. The primary stamp happens earlier in
// blockvalidation.BlockFound via ReportPeerBlockHeaderSeen, but this write
// covers paths that reach AddBlock without having gone through BlockFound
// (e.g. locally mined blocks). Stamping is write-once, so double writes are
// cheap no-ops.
b.receivedAt.stamp(block.Hash())
// The primary first-seen stamp for the subtree-only liveness gate happens
// earlier in blockvalidation.BlockFound via ReportPeerBlockHeaderSeen.
// Avoid stamping every stored block here, as this path is exercised during
// fast catchup and can otherwise grow the in-memory receivedAt map even when
// the liveness optimization is not in use.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — gated the AddBlock stamp behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0 in both Server.AddBlock and LocalClient.AddBlock. Combined with the conditional store allocation (other thread), nodes that don't use the optimisation now allocate zero receivedAt entries during fast catchup. The stamp still fires for locally mined blocks on nodes where the gate is enabled (they reach AddBlock without going through BlockFound), which is the original belt-and-braces intent.

Make default-off deployments a true zero-allocation no-op for the
liveness-gate feature:

- check_block_subtrees.go: move the subtreeOnly early-return ahead of
  the subtreeTxs[i] = make(..., subtreeToCheck.Length()) pre-allocation
  so the gated path doesn't allocate a per-subtree tx slice. The
  remaining nil-slice still makes the block-level batch
  processTransactionsInLevels skip (batchTxCount == 0).
- received_at.go: nil-safe stamp() and lookup() so call sites can leave
  the store unallocated without extra guards.
- Server.go / LocalClient.go (New + AddBlock): allocate the receivedAt
  store only when the gate is enabled at construction, and gate the
  AddBlock belt-and-braces stamp behind the same setting. Fast catchup
  on default-off nodes no longer grows an in-memory map.
- server_test.go (TestGetHeaderReceivedAt_StampedOnAddBlock): enable
  the gate setting + allocate the store for this test since it was
  relying on the always-on default behaviour we've now removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +33 to +34
// DecisionSubtreeData — gate disabled by setting, or stamp is stale.
// Caller must fall back to the subtreeData path.

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc mismatch: the comment for DecisionSubtreeData says it covers “gate disabled by setting, or stamp is stale”, but Decide() also returns DecisionSubtreeData when window <= 0. Please update the DecisionSubtreeData comment to include the non-positive window short-circuit (so future readers don’t assume these cases are distinguished elsewhere).

Suggested change
// DecisionSubtreeData — gate disabled by setting, or stamp is stale.
// Caller must fall back to the subtreeData path.
// DecisionSubtreeData — gate disabled by setting, liveness window is
// non-positive, or stamp is stale. Caller must fall back to the
// subtreeData path.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +71
// Once-per-block liveness gate. If true, every per-subtree worker in this
// invocation skips the subtreeData download. In that path the per-subtree
// worker never populates subtreeTxs[subtreeIdx], so the block-level batch
// processTransactionsInLevels call is skipped entirely (batchTxCount == 0).
// Transaction resolution happens downstream in validation via the subtree
// manifest + UTXO/TxMetaCache, with per-tx fetch as the fallback for real
// misses (validator resolves each referenced tx hash on demand).
subtreeOnly := u.ShouldUseSubtreeOnlyPath(ctx, block.Hash())

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The liveness gate is evaluated before the tracing span is started. That means the GetHeaderReceivedAt RPC (and any debug logging/metrics side-effects) won’t be associated with the CheckBlockSubtrees span, which makes it harder to debug gate behavior from traces. Consider starting the tracing span first, then computing subtreeOnly using the traced ctx.

Copilot uses AI. Check for mistakes.
Comment on lines +1151 to +1152
if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential nil deref: this calls u.blockchainClient.ReportPeerBlockHeaderSeen(...) without checking u.blockchainClient != nil. blockvalidation.New takes blockchainClient as a parameter but doesn’t guard against nil, and some tests previously built Server without setting this field. Please add a nil check (or call through the already-required client used by u.blockValidation) so enabling AssumeTxsBroadcastToAllNodes can’t crash the service if the client wasn’t wired.

Suggested change
if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)
if u.blockchainClient != nil {
if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)
}
} else {
u.logger.Debugf("[BlockFound][%s] skipping ReportPeerBlockHeaderSeen because blockchain client is not configured", hash.String())

Copilot uses AI. Check for mistakes.
@freemans13 freemans13 changed the title feat(subtreevalidation): per-header liveness gate for subtree-only validation feat(subtreevalidation): skip subtreeData download for blocks whose txs we already have Apr 22, 2026
freemans13 and others added 2 commits April 22, 2026 04:53
…rapper

The liveness gate is a feature-specific decision helper, not a general
teranode utility — both callers (subtreevalidation per-subtree gating
and blockvalidation catchup prefetch) consume the same one feature,
driven by SubtreeValidation.AssumeTxsBroadcastToAllNodes.

  - Move util/livenessgate/ → services/subtreevalidation/livenessgate/
    so the package lives alongside the setting that owns it.
  - Drop the package-level ShouldUseSubtreeOnlyPath convenience. Callers
    use Decide directly and compare against DecisionSubtreeOnly (exactly
    one extra line at the one site that used the wrapper).
  - Drop the three `var _ livenessgate.Client = (*T)(nil)` assertions in
    services/blockchain/. They created a reverse-layering import
    (blockchain depending on a subtreevalidation subpackage) and were
    redundant — real call sites already pass the blockchain client to
    livenessgate.Decide, so the compiler checks the interface match at
    the call site.

Behaviour unchanged. The Decision enum + metric labels are kept; the
(*Server).ShouldUseSubtreeOnlyPath wrapper in subtreevalidation is
kept (it bundles metric emission and logging over Decide).

Net: -37 lines, one fewer package, no cross-service dependency inversion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
76.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@freemans13

Copy link
Copy Markdown
Collaborator Author

Superseded by #745 — self-discovering adaptive gate replaces the operator-configured feature flag + wall-clock liveness gate. The new design deliberately avoids any wall-clock or FSM-state gating (enforced by a source-scanning regression test), eliminating the class of bug that caused PR #598 to be reverted via #647.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants