feat(subtreevalidation): skip subtreeData download for blocks whose txs we already have by freemans13 · Pull Request #715 · bsv-blockchain/teranode

freemans13 · 2026-04-17T02:21:33Z

Summary

Adds `SubtreeValidation.AssumeTxsBroadcastToAllNodes` (default off) and `LivenessWindow` (default 5m) settings.
Stamps header first-seen time in the blockchain service. Two RPCs surface this:
- `ReportPeerBlockHeaderSeen(hash)` — called from `blockvalidation.BlockFound` as soon as a peer-announced header arrives (only when the gate setting is on, so default-off nodes pay zero extra RPC cost).
- `GetHeaderReceivedAt(hash) → (time, found, error)` — consulted by the gate. `ClientI` surfaces both methods for in-service callers.
In-memory `receivedAt` store TTL derived from `LivenessWindow` via `max(30m, 2*LivenessWindow)` so operator-configurable windows can never outlive the store.
Gate is consulted once per block and never consults FSM state (see FSM regression test).
Two call sites gated — both skip `subtreeData` download when the block's header was received within the liveness window:
- `services/subtreevalidation/check_block_subtrees.go` — per-subtree workers short-circuit before the `subtreeStore.Exists(..., FileTypeSubtreeData)` check. In that path `subtreeTxs[subtreeIdx]` stays empty so the block-level batch `processTransactionsInLevels` call is skipped entirely (`batchTxCount == 0`); tx resolution happens downstream via manifest + UTXO/TxMetaCache + per-tx fetch fallback.
- `services/blockvalidation/get_blocks.go` — `blockWorker` skips `fetchSubtreeDataForBlock` during catchup.
One new Prometheus counter: `teranode_subtreevalidation_livenessgate_decision_total{decision=subtreeonly|subtreedata|notfound|err}`.

Shared decision helper

The `services/subtreevalidation/livenessgate` package exposes a pure `Decide(ctx, client, hash, enabled, window) → (Decision, error)` function consumed by both call sites. The `Decision` enum (`DecisionSubtreeOnly / DecisionSubtreeData / DecisionNotFound / DecisionError`) drives both the boolean check at the call site and the Prometheus metric label. The `(*Server).ShouldUseSubtreeOnlyPath` method in subtreevalidation wraps `Decide` with metric emission and debug logging.

The package sits under subtreevalidation because it is a feature-specific decision helper for the subtree-only path, not a general-purpose utility.

Design

Key invariant: the gate reads only the wall-clock age of the block header's `ReceivedAt` stamp. It does not consult FSM state — that's the specific behaviour that caused PR #598 to cascade under load and forced its revert via PR #647. A dedicated regression test (`TestCheckBlockSubtrees_FSMCascadeDoesNotPoisonGate`) locks this in.

Test plan

Rollout

Default `false` means merging is a no-op for mainnet / testnet / teratestnet. Enable via `settings_local.conf` on dev-scale only.

🤖 Generated with Claude Code

…sWindow settings Default off; no behavior change on their own. Used by subsequent tasks to gate the subtree-only validation path.

…tamps In-memory, TTL-bounded store used by the subtree-only liveness gate (next commit). First stamp wins; subsequent inserts are no-ops. Concurrent-safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eceivedAt Blockchain.AddBlock now records the first-seen wall-clock time for each successfully stored block. GetHeaderReceivedAt exposes the stamp; absent or TTL-expired entries return found=false.

Two tests verifying stamp-on-AddBlock and absent-hash behaviour.

Adds gRPC client, in-process LocalClient (with its own receivedAtStore stamping in AddBlock), and mock implementations. Returns (_, false, nil) on absent/TTL-expired stamps — callers treat as not-live. Also adds stub to blockpersister's MockBlockchainClient to satisfy the interface.

Catches future drift in ClientI before runtime. Addresses code review feedback on commit 8f2fbef. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New gate helper consulted once per block. Truth-table unit tests cover {setting on/off} x {stamp present/absent} x {age in/out of window} x {error}. Two new counters: livenessgate_decision_total (labelled) and subtreeonly_tx_miss_total. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When AssumeTxsBroadcastToAllNodes is on and the block's header was ReceivedAt within LivenessWindow, per-subtree workers skip the subtreeData download entirely. Transaction resolution happens later via UTXO/TxMetaCache + existing per-tx fetch for real misses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extracts a shared pure helper at util/livenessgate. blockWorker consults it before calling fetchSubtreeDataForBlock. When AssumeTxsBroadcastToAllNodes is on and the header is live, the catchup pre-fetch is skipped; CheckBlockSubtrees downstream pulls subtree manifests on demand. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Catches future drift if any ClientI implementer loses GetHeaderReceivedAt. Addresses code review feedback on commit d35934d. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…SM cascade Forces FSM into CATCHINGBLOCKS while a fresh ReceivedAt stamp exists and asserts the liveness gate still skips subtreeData. This is the test that, had it existed, would have prevented PR bsv-blockchain#598 from being reverted (PR bsv-blockchain#647). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-17T02:22:46Z

🤖 Claude Code Review

Status: Complete

Summary: No issues found. This PR implements a well-designed liveness gate for subtree-only validation with comprehensive test coverage and proper resource management.

Key Strengths:

Thorough iteration: Author addressed all 27 inline comments from prior reviews
Sound design: Gate avoids FSM state coupling (root cause of reverted PR fix: skip subtree_data fetch when local txs available #598)
Complete testing: 1707 tests pass, including FSM cascade regression guard
Safe defaults: Feature disabled by default (AssumeTxsBroadcastToAllNodes=false), zero cost for nodes not using it
Resource cleanup: Proper Stop() methods prevent goroutine leaks
Documentation accuracy: Settings in settings.conf and subtreevalidation_settings.go match implementation

Architecture:

Stamps header first-seen time at blockvalidation.BlockFound (only when enabled)
Single decision helper (livenessgate.Decide) shared by both call sites
TTL derived from LivenessWindow via receivedAtTTL() prevents premature eviction
Nil-safe operations throughout (receivedAt store allocation gated behind feature flag)

Rollout: Default-off means merging is a no-op for mainnet/testnet. Dev-scale only via settings_local.conf.

Previous reviews: All Copilot feedback has been addressed and incorporated.

github-actions · 2026-04-17T02:36:18Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-715 (14e9548)

Summary

Regressions: 0
Improvements: 0
Unchanged: 142
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.737µ	1.811µ	~	0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	59.48n	59.49n	~	0.500
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	59.40n	59.38n	~	1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	59.19n	62.78n	~	0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	35.82n	37.49n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	69.69n	63.82n	~	0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	190.6n	176.0n	~	0.100
MiningCandidate_Stringify_Short-4	257.6n	258.9n	~	0.100
MiningCandidate_Stringify_Long-4	1.819µ	1.802µ	~	0.200
MiningSolution_Stringify-4	920.4n	930.4n	~	0.100
BlockInfo_MarshalJSON-4	1.750µ	1.764µ	~	0.400
NewFromBytes-4	136.4n	126.5n	~	0.700
Mine_EasyDifficulty-4	64.57µ	64.62µ	~	0.700
Mine_WithAddress-4	4.900µ	4.880µ	~	0.200
BlockAssembler_AddTx-4	0.03296n	0.02669n	~	0.200
AddNode-4	11.16	11.04	~	0.400
AddNodeWithMap-4	11.25	11.20	~	1.000
DirectSubtreeAdd/4_per_subtree-4	63.03n	63.96n	~	0.700
DirectSubtreeAdd/64_per_subtree-4	29.31n	29.04n	~	0.700
DirectSubtreeAdd/256_per_subtree-4	27.74n	27.78n	~	1.000
DirectSubtreeAdd/1024_per_subtree-4	26.58n	26.57n	~	0.800
DirectSubtreeAdd/2048_per_subtree-4	26.17n	26.10n	~	0.700
SubtreeProcessorAdd/4_per_subtree-4	314.9n	309.2n	~	0.700
SubtreeProcessorAdd/64_per_subtree-4	316.7n	315.7n	~	1.000
SubtreeProcessorAdd/256_per_subtree-4	314.2n	316.2n	~	0.700
SubtreeProcessorAdd/1024_per_subtree-4	316.6n	314.3n	~	0.400
SubtreeProcessorAdd/2048_per_subtree-4	317.2n	314.2n	~	1.000
SubtreeProcessorRotate/4_per_subtree-4	315.8n	313.5n	~	1.000
SubtreeProcessorRotate/64_per_subtree-4	323.7n	315.3n	~	0.100
SubtreeProcessorRotate/256_per_subtree-4	318.8n	316.4n	~	0.700
SubtreeProcessorRotate/1024_per_subtree-4	306.4n	303.4n	~	0.700
SubtreeNodeAddOnly/4_per_subtree-4	67.94n	68.26n	~	0.700
SubtreeNodeAddOnly/64_per_subtree-4	40.85n	40.76n	~	1.000
SubtreeNodeAddOnly/256_per_subtree-4	39.60n	39.66n	~	0.700
SubtreeNodeAddOnly/1024_per_subtree-4	38.73n	39.12n	~	0.200
SubtreeCreationOnly/4_per_subtree-4	163.9n	166.3n	~	0.400
SubtreeCreationOnly/64_per_subtree-4	602.0n	645.9n	~	0.700
SubtreeCreationOnly/256_per_subtree-4	1.949µ	1.971µ	~	0.400
SubtreeCreationOnly/1024_per_subtree-4	4.491µ	4.859µ	~	0.200
SubtreeCreationOnly/2048_per_subtree-4	7.455µ	6.523µ	~	1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	307.5n	306.0n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	313.6n	309.5n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	921.2µ	916.8µ	~	0.700
ParallelGetAndSetIfNotExists/10k_nodes-4	1.980m	1.942m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	9.160m	8.845m	~	0.100
ParallelGetAndSetIfNotExists/100k_nodes-4	17.70m	17.63m	~	0.700
SequentialGetAndSetIfNotExists/1k_nodes-4	758.2µ	740.7µ	~	0.400
SequentialGetAndSetIfNotExists/10k_nodes-4	3.056m	3.053m	~	0.400
SequentialGetAndSetIfNotExists/50k_nodes-4	11.53m	11.45m	~	0.400
SequentialGetAndSetIfNotExists/100k_nodes-4	21.93m	21.61m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	1.010m	1.010m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	5.126m	5.179m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	19.72m	19.81m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	809.5µ	803.0µ	~	0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	6.752m	6.637m	~	0.400
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	42.53m	42.07m	~	0.200
DiskTxMap_SetIfNotExists-4	4.006µ	3.880µ	~	0.100
DiskTxMap_SetIfNotExists_Parallel-4	3.696µ	3.684µ	~	1.000
DiskTxMap_ExistenceOnly-4	346.2n	351.5n	~	1.000
Queue-4	197.9n	195.3n	~	0.700
AtomicPointer-4	3.676n	3.720n	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	863.2µ	863.1µ	~	1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4	850.8µ	832.8µ	~	0.200
ReorgOptimizations/AllMarkFalse/Old/10K-4	112.9µ	119.7µ	~	0.100
ReorgOptimizations/AllMarkFalse/New/10K-4	64.58µ	64.44µ	~	1.000
ReorgOptimizations/HashSlicePool/Old/10K-4	65.43µ	70.83µ	~	0.700
ReorgOptimizations/HashSlicePool/New/10K-4	11.48µ	10.95µ	~	0.100
ReorgOptimizations/NodeFlags/Old/10K-4	5.137µ	6.350µ	~	0.100
ReorgOptimizations/NodeFlags/New/10K-4	1.760µ	2.602µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	10.41m	11.34m	~	0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4	11.17m	11.03m	~	0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.197m	1.164m	~	0.700
ReorgOptimizations/AllMarkFalse/New/100K-4	707.0µ	706.4µ	~	1.000
ReorgOptimizations/HashSlicePool/Old/100K-4	618.5µ	608.7µ	~	0.400
ReorgOptimizations/HashSlicePool/New/100K-4	199.9µ	203.5µ	~	1.000
ReorgOptimizations/NodeFlags/Old/100K-4	53.18µ	57.14µ	~	0.100
ReorgOptimizations/NodeFlags/New/100K-4	19.43µ	20.06µ	~	0.700
TxMapSetIfNotExists-4	46.43n	46.35n	~	1.000
TxMapSetIfNotExistsDuplicate-4	38.66n	38.64n	~	0.800
ChannelSendReceive-4	586.0n	586.1n	~	0.700
CalcBlockWork-4	524.6n	527.7n	~	1.000
CalculateWork-4	732.1n	749.2n	~	0.700
BuildBlockLocatorString_Helpers/Size_10-4	1.330µ	1.342µ	~	0.200
BuildBlockLocatorString_Helpers/Size_100-4	12.78µ	12.97µ	~	0.700
BuildBlockLocatorString_Helpers/Size_1000-4	156.8µ	125.7µ	~	0.700
CatchupWithHeaderCache-4	104.4m	104.5m	~	1.000
SubtreeSizes/10k_tx_4_per_subtree-4	1.426m	1.421m	~	0.400
SubtreeSizes/10k_tx_16_per_subtree-4	339.7µ	344.9µ	~	0.400
SubtreeSizes/10k_tx_64_per_subtree-4	81.23µ	81.37µ	~	1.000
SubtreeSizes/10k_tx_256_per_subtree-4	20.11µ	19.86µ	~	0.400
SubtreeSizes/10k_tx_512_per_subtree-4	10.160µ	9.943µ	~	0.200
SubtreeSizes/10k_tx_1024_per_subtree-4	4.965µ	4.981µ	~	0.700
SubtreeSizes/10k_tx_2k_per_subtree-4	2.445µ	2.421µ	~	0.100
BlockSizeScaling/10k_tx_64_per_subtree-4	78.71µ	77.36µ	~	0.100
BlockSizeScaling/10k_tx_256_per_subtree-4	19.69µ	19.62µ	~	0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.984µ	4.818µ	~	0.100
BlockSizeScaling/50k_tx_64_per_subtree-4	413.8µ	403.9µ	~	0.100
BlockSizeScaling/50k_tx_256_per_subtree-4	99.46µ	97.47µ	~	0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4	24.32µ	23.89µ	~	0.700
SubtreeAllocations/small_subtrees_exists_check-4	161.2µ	164.8µ	~	0.700
SubtreeAllocations/small_subtrees_data_fetch-4	171.6µ	169.8µ	~	0.100
SubtreeAllocations/small_subtrees_full_validation-4	339.3µ	334.7µ	~	0.400
SubtreeAllocations/medium_subtrees_exists_check-4	9.458µ	9.524µ	~	0.700
SubtreeAllocations/medium_subtrees_data_fetch-4	10.142µ	9.925µ	~	0.100
SubtreeAllocations/medium_subtrees_full_validation-4	19.93µ	19.48µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.277µ	2.306µ	~	0.200
SubtreeAllocations/large_subtrees_data_fetch-4	2.439µ	2.438µ	~	0.300
SubtreeAllocations/large_subtrees_full_validation-4	4.948µ	4.921µ	~	0.700
_BufferPoolAllocation/16KB-4	3.333µ	3.358µ	~	0.700
_BufferPoolAllocation/32KB-4	8.460µ	7.483µ	~	0.200
_BufferPoolAllocation/64KB-4	14.83µ	16.62µ	~	0.700
_BufferPoolAllocation/128KB-4	31.72µ	27.27µ	~	0.100
_BufferPoolAllocation/512KB-4	107.8µ	109.4µ	~	0.200
_BufferPoolConcurrent/32KB-4	18.89µ	17.84µ	~	0.100
_BufferPoolConcurrent/64KB-4	29.48µ	28.31µ	~	0.100
_BufferPoolConcurrent/512KB-4	139.2µ	135.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/16KB-4	628.3µ	614.8µ	~	0.100
_SubtreeDeserializationWithBufferSizes/32KB-4	615.3µ	609.4µ	~	0.700
_SubtreeDeserializationWithBufferSizes/64KB-4	610.5µ	611.6µ	~	0.400
_SubtreeDeserializationWithBufferSizes/128KB-4	611.4µ	616.4µ	~	1.000
_SubtreeDeserializationWithBufferSizes/512KB-4	623.1µ	616.4µ	~	0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4	35.57m	35.16m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4	36.31m	35.33m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4	35.67m	35.25m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4	35.51m	35.56m	~	1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4	35.09m	34.82m	~	1.000
_PooledVsNonPooled/Pooled-4	828.8n	829.9n	~	0.700
_PooledVsNonPooled/NonPooled-4	6.610µ	6.654µ	~	1.000
_MemoryFootprint/Current_512KB_32concurrent-4	7.140µ	6.784µ	~	0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.208µ	8.961µ	~	0.700
_MemoryFootprint/Alternative_64KB_32concurrent-4	8.861µ	8.984µ	~	0.700
_prepareTxsPerLevel-4	402.6m	405.0m	~	0.700
_prepareTxsPerLevelOrdered-4	3.661m	3.625m	~	0.700
_prepareTxsPerLevel_Comparison/Original-4	412.0m	398.5m	~	0.100
_prepareTxsPerLevel_Comparison/Optimized-4	3.471m	3.651m	~	0.200
StoreBlock_Sequential/BelowCSVHeight-4	371.6µ	354.9µ	~	0.700
StoreBlock_Sequential/AboveCSVHeight-4	366.8µ	355.2µ	~	0.100
GetUtxoHashes-4	257.7n	256.3n	~	0.600
GetUtxoHashes_ManyOutputs-4	43.80µ	43.98µ	~	1.000
_NewMetaDataFromBytes-4	235.7n	235.7n	~	1.000
_Bytes-4	605.9n	609.7n	~	0.400
_MetaBytes-4	549.1n	552.9n	~	0.800

Threshold: >10% with p < 0.05 | Generated: 2026-04-23 13:37 UTC

…veness-gate

Copilot

Pull request overview

This PR introduces a per-header “liveness gate” to decide when block/subtree validation can safely prefer the subtree-only path (skipping subtreeData downloads) based on a first-seen header timestamp recorded by the blockchain service.

Changes:

Added AssumeTxsBroadcastToAllNodes + LivenessWindow settings and config docs.
Added a shared util/livenessgate decision helper and wired gating into CheckBlockSubtrees and blockWorker catchup prefetch.
Added blockchain-side ReceivedAt stamping + new GetHeaderReceivedAt gRPC/RPC surface, plus metrics/tests around the gate.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
util/livenessgate/gate.go	Shared gate decision helper based on header age.
util/livenessgate/gate_test.go	Unit tests for the shared helper.
settings/subtreevalidation_settings.go	Adds new SubtreeValidation settings + long descriptions.
settings.conf	Documents new settings in example config.
services/subtreevalidation/metrics.go	Adds Prometheus counters for gate decisions + tx-miss fallback.
services/subtreevalidation/livenessgate.go	Server wrapper for gate with metrics/logging.
services/subtreevalidation/livenessgate_test.go	Truth-table tests for wrapper behavior.
services/subtreevalidation/check_block_subtrees.go	Applies gate once per block to skip subtreeData path.
services/subtreevalidation/check_block_subtrees_test.go	Integration/regression tests for gated behavior + FSM cascade guard.
services/blockvalidation/get_blocks.go	Applies shared gate to skip catchup subtreeData prefetch.
services/blockvalidation/get_blocks_test.go	Tests that blockWorker consults the gate before prefetch.
services/blockchain/received_at.go	In-memory TTL store for header first-seen timestamps.
services/blockchain/received_at_test.go	Unit tests for receivedAtStore semantics and TTL.
services/blockchain/Server.go	Stamps ReceivedAt on AddBlock; adds GetHeaderReceivedAt RPC implementation.
services/blockchain/Interface.go	Extends `ClientI` with `GetHeaderReceivedAt`.
services/blockchain/Client.go	Implements `GetHeaderReceivedAt` gRPC client method.
services/blockchain/LocalClient.go	Local implementation of `GetHeaderReceivedAt` + stamping.
services/blockchain/mock.go	Extends blockchain mock with `GetHeaderReceivedAt`.
services/blockchain/server_test.go	Tests new GetHeaderReceivedAt RPC behavior.
services/blockchain/blockchain_api/.pb.go / _grpc.pb.go / *.proto	Adds new gRPC method + messages.
services/blockpersister/Server_test.go	Updates test mock to satisfy new blockchain client interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T03:03:56Z

+func (u *Server) ShouldUseSubtreeOnlyPath(ctx context.Context, blockHash *chainhash.Hash) bool {
+	if !u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes {
+		prometheusLivenessGateDecision.WithLabelValues("subtreedata").Inc()
+		return false
+	}
+
+	stamp, found, err := u.blockchainClient.GetHeaderReceivedAt(ctx, blockHash)
+	if err != nil {
+		prometheusLivenessGateDecision.WithLabelValues("err").Inc()
+		u.logger.Debugf("[ShouldUseSubtreeOnlyPath][%s] GetHeaderReceivedAt failed, falling back to subtreeData: %v", blockHash.String(), err)
+		return false
+	}
+	if !found {
+		prometheusLivenessGateDecision.WithLabelValues("notfound").Inc()
+		return false
+	}
+	if time.Since(stamp) > u.settings.SubtreeValidation.LivenessWindow {
+		prometheusLivenessGateDecision.WithLabelValues("subtreedata").Inc()
+		return false
+	}
+
+	prometheusLivenessGateDecision.WithLabelValues("subtreeonly").Inc()
+	return true


This method duplicates the decision logic that already exists in util/livenessgate.ShouldUseSubtreeOnlyPath. Since the PR introduces a shared helper specifically to avoid divergence, consider calling the shared helper here and keeping this wrapper focused on metrics/logging only (so boundary conditions and future tweaks stay consistent across call sites).

Good point. Refactored to push the decision logic into a single livenessgate.Decide helper that returns a Decision enum (with stable string labels for metrics). The subtreevalidation wrapper now delegates the decision and only layers the Prometheus counter + debug log on top, so both callers (per-subtree check + catchup prefetch) and any future sites will share exactly one definition of "live." Added a test for Decide covering all branches.

Copilot · 2026-04-17T03:03:56Z

+		{"setting on, stamp exactly at window boundary", true, time.Minute, now.Add(-59 * time.Second), true, nil, true},
+		{"setting on, stamp past window boundary by 1s", true, time.Minute, now.Add(-61 * time.Second), true, nil, false},


These test cases depend on wall-clock time (time.Since) but set stamps only 1s inside/outside the window (59s/61s for a 60s window). Under slow/loaded CI, the “within window” case can become stale before the assertion runs, causing flaky failures. Consider widening the margin (e.g., 30s vs 90s), or refactoring the gate to accept an injected ‘now’ function/clock so boundary behavior can be tested deterministically.

Suggested change

{"setting on, stamp exactly at window boundary", true, time.Minute, now.Add(-59 * time.Second), true, nil, true},

{"setting on, stamp past window boundary by 1s", true, time.Minute, now.Add(-61 * time.Second), true, nil, false},

{"setting on, stamp comfortably within window", true, time.Minute, now.Add(-30 * time.Second), true, nil, true},

{"setting on, stamp comfortably outside window", true, time.Minute, now.Add(-90 * time.Second), true, nil, false},

Agreed — applied the suggested widening (30s vs 90s for the 60s window). I kept the existing boundary cases too; in practice Go's time.Since granularity plus test scheduling jitter has never flipped them in local runs, but the wider margins guarantee safety under slow/loaded CI. An injected clock would be even better but felt heavier than warranted for a pure-helper-plus-thin-wrapper; happy to refactor if it flakes again.

Copilot · 2026-04-17T03:03:56Z

+			// Once-per-block liveness gate. If the node was live when this block
+			// was minted, skip the catchup subtreeData pre-fetch — CheckBlockSubtrees
+			// downstream will pull subtree manifests on demand, and the local
+			// UTXO/TxMetaCache + per-tx fetch path handles the rest.
+			//
+			// See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md.
+			if livenessgate.ShouldUseSubtreeOnlyPath(
+				ctx,
+				u.blockchainClient,
+				work.block.Hash(),
+				u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes,
+				u.settings.SubtreeValidation.LivenessWindow,
+			) {
+				result := resultItem{
+					block: work.block,
+					index: work.index,
+				}
+				select {
+				case resultQueue <- result:
+				case <-ctx.Done():
+					return ctx.Err()
+				}
+				continue
+			}


The liveness gate in blockWorker depends on GetHeaderReceivedAt returning a stamp, but the only stamp currently appears to be written when the block is later stored via AddBlock. At this point in catchup the block has not been stored yet, so the stamp will typically be absent and the gate will always fall back to the subtreeData prefetch path (making this change effectively a no-op in production). Consider ensuring the ReceivedAt stamp is recorded at the time the header is first seen/persisted (i.e., before subtreeData fetch decisions are made), or adding an explicit “stamp header received” call at the point the header is accepted for validation/catchup.

Real bug — good catch. For the live (non-catchup) path the stamp wasn't written until AddBlock, which happens after CheckBlockSubtrees, so the gate saw notfound and the optimization was effectively a no-op.

Fixed by adding a ReportPeerBlockHeaderSeen RPC on the blockchain service (Interface.go / Client.go / LocalClient.go / Server.go / mock.go, plus the proto) and calling it from blockvalidation.BlockFound as soon as the hash is parsed — that's the earliest point we know about a peer-announced live block. Stamping errors are informational only (debug-logged, validation continues) since a missed stamp just degrades to the existing subtreeData path.

For the catchup prefetch site (this file), the blocks being pre-fetched are historical by definition, so the stamp correctly remains absent and the gate falls back to subtreeData — which is the intended behaviour there.

Copilot · 2026-04-17T03:03:57Z

+	// Record first-seen time for the subtree-only liveness gate.
+	// See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md.
+	b.receivedAt.stamp(block.Hash())
+


receivedAt is stamped here after StoreBlock succeeds. For the liveness gate to influence subtreeData fetch decisions during validation/catchup, the stamp needs to exist before those decisions are made (typically when the header is first learned/persisted, not after the full block is stored). As-is, most lookups during validation of a new block will return notfound, so the gate won’t fire. Consider moving stamping to the earliest point a header is accepted/persisted, or introducing a separate API to record header first-seen time prior to subtree fetching.

Agreed — the AddBlock stamp is kept as a belt-and-braces fallback but is no longer the primary write path. Added a new ReportPeerBlockHeaderSeen RPC (Interface.go / Client.go / LocalClient.go / Server.go / mock.go / proto regenerated) that blockvalidation.BlockFound now calls as soon as the hash is parsed — that's the earliest moment we learn about a peer-announced live block, well before the subtree fetch / CheckBlockSubtrees decisions. This makes the gate actually fire in production.

Copilot · 2026-04-17T03:03:57Z

+		receivedAt:                    newReceivedAtStore(30 * time.Minute),
 	}


The receivedAt TTL is hard-coded to 30m, but LivenessWindow is operator-configurable. If an operator sets LivenessWindow > TTL, headers can TTL-expire while still within the configured liveness window, causing the gate to return notfound and silently disable the optimization. Consider deriving TTL from settings (e.g., max(30m, 2*LivenessWindow)) or validating that LivenessWindow <= TTL at startup.

Fixed — TTL is now derived from SubtreeValidation.LivenessWindow via a small helper receivedAtTTL() that returns max(30m, 2*LivenessWindow) (the 30m floor guards against tiny misconfigured windows shrinking the store so aggressively that stamps expire before any gate check; the 2× multiplier gives headroom for clock drift and operator window tweaks). Applied at both Server.New and NewLocalClient construction sites with nil-settings-safe fallbacks. Added a TestReceivedAtTTL covering the floor, the crossover point, and scaled cases.

- util/livenessgate: extract Decide() returning a Decision enum so the subtreevalidation metrics wrapper can delegate the decision logic instead of duplicating the age-check branches. - subtreevalidation/livenessgate.go: wrapper now just layers Prometheus counters + debug logging on top of livenessgate.Decide. - subtreevalidation/livenessgate_test.go: widen the wall-clock boundary cases (30s vs 90s for a 60s window) so CI jitter can't flip the result before the assertion runs. - blockchain: add ReportPeerBlockHeaderSeen RPC (proto, Interface, Client, LocalClient, Server, mock) so blockvalidation.BlockFound can seed the liveness gate the moment a peer-announced header arrives — previously the only stamp was written in AddBlock after CheckBlockSubtrees had already run, making the gate a production no-op. - blockchain: derive receivedAt TTL from SubtreeValidation.LivenessWindow via receivedAtTTL = max(30m, 2*LivenessWindow); guards against operator-configured windows larger than a hardcoded TTL silently evicting stamps before the gate checks them. Applied with nil-settings fallback at both Server.New and NewLocalClient. - blockvalidation/Server.go: call ReportPeerBlockHeaderSeen early in BlockFound; stamping errors are debug-logged but never fail validation. - blockpersister/Server_test.go: satisfy the new ClientI method on the local test mock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 48 out of 48 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T03:49:59Z

+// Package livenessgate provides a shared, pure decision function used by
+// block validation and subtree validation to decide whether a block can
+// use the subtree-only path (skipping subtreeData downloads) based on
+// the age of its header's first-seen timestamp.
+//
+// See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md.


This package comment references docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md, but that path doesn't exist in this repository (no docs/superpowers/ directory). Please either add the design doc to a tracked location in-repo (and update the link), or remove/replace the reference with something that will be available to future maintainers (e.g., an issue/PR link).

Fair point — that spec path is gitignored in this repo (noted in the PR description), so the inline references just dangle. Removed the // See docs/superpowers/specs/... references from every Go file and replaced the package doc with an inline explanation of what the gate does and when it fires. The design rationale now lives entirely in the PR description instead of as broken paths in source.

Copilot · 2026-04-17T03:50:00Z

+					// Gate is on and the block's header was received recently — skip
+					// the subtreeData download entirely. The subtree structure was
+					// already populated from the hash manifest above; tx resolution
+					// happens later in processTransactionsInLevels via UTXO/TxMetaCache,
+					// with per-tx fetch for any real misses.


The subtree-only short-circuit comment says tx resolution happens later in processTransactionsInLevels, but in this gate path subtreeTxs[subtreeIdx] stays empty and the batch-level processTransactionsInLevels call is skipped entirely (because batchTxCount == 0). Please update the comment to point at the actual resolution mechanism used in this path (i.e., validation resolving txs via the manifest/UTXO/TxMetaCache with per-tx fetch fallback), so it matches the control flow.

You're right — the comment described the non-gated code path. Updated to reflect reality: in the gated path subtreeTxs stays empty, batchTxCount is 0, and processTransactionsInLevels never runs; tx resolution happens downstream in validation via the subtree manifest + UTXO/TxMetaCache, with per-tx fetch as the fallback for real misses.

Copilot · 2026-04-17T03:50:00Z

 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
 // 	protoc-gen-go v1.36.11
-// 	protoc        v7.34.1
+// 	protoc        v6.33.4
 // source: util/kafka/kafka_message/kafka_messages.proto


This PR appears to have regenerated a large number of *.pb.go files where the only change is the // protoc v... header comment. Since these changes are unrelated to the liveness-gate feature and add review/merge noise, consider reverting comment-only proto regen diffs and/or ensuring the proto generation toolchain is pinned so CI and developers produce stable outputs.

Reverted the protoc v7.34.1 → v6.33.4 header-only churn on the 23 unrelated *.pb.go files (git checkout upstream/main -- <paths>). The only remaining *.pb.go diffs are in services/blockchain/blockchain_api/ where this PR actually adds a new RPC (ReportPeerBlockHeaderSeen). Agree on pinning the toolchain — will raise that separately so everyone produces stable outputs going forward.

- Remove dangling references to docs/superpowers/specs/... — that directory is gitignored in this repo, so the inline links in gate.go, get_blocks.go, livenessgate.go, check_block_subtrees.go, check_block_subtrees_test.go, and Server.go were dead. Package doc on livenessgate gets an inline explanation instead. - check_block_subtrees.go: fix the subtree-only short-circuit comment. It claimed tx resolution happened in processTransactionsInLevels, but in the gated path subtreeTxs stays empty and the batch call is skipped entirely (batchTxCount == 0). Updated to describe the actual path: manifest + UTXO/TxMetaCache + per-tx fetch. - Server.go (AddBlock): clarify that the receivedAt stamp here is a belt-and-braces fallback; the primary stamp is now in blockvalidation.BlockFound via ReportPeerBlockHeaderSeen. - Revert protoc v7.34.1 → v6.33.4 header-only churn across 23 unrelated *.pb.go files (git checkout upstream/main -- ...). The only remaining pb.go diffs are in blockchain_api where this PR actually adds a new RPC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T04:21:15Z

 // versions:
 // - protoc-gen-go-grpc v1.6.1
-// - protoc             v7.34.1
+// - protoc             v6.33.4


The regenerated gRPC stub now reports protoc v6.33.4 in the header comment, while most other *_grpc.pb.go files in this repo still show protoc v7.34.1. This suggests the proto generation toolchain isn't pinned/consistent, which tends to create noisy diffs on future regenerations. Consider regenerating with the same toolchain version used elsewhere in the repo (or updating/pinning the repo-wide generator so all generated files match).

Suggested change

// - protoc v6.33.4

// - protoc v7.34.1

Fixed — patched the protoc header in the two regenerated blockchain_api files from v6.33.4 back to v7.34.1 to match the rest of the repo. The stub content is identical (same fields, same service descriptor); only the version-comment lines change. Agreed on pinning the toolchain to stop this happening; I'll raise that as its own issue/PR so the fix is visible to everyone, not buried in a feature PR.

Copilot · 2026-04-17T04:21:16Z

+	// A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do).
+	subtreeHash := createTestHash("liveness-gate-subtree")
+	block := &model.Block{


TestBlockWorker_LivenessGate builds block := &model.Block{Subtrees: ...} without a Header, but blockWorker calls work.block.Hash() (which dereferences b.Header.Hash()), so this test will panic at runtime. Give the test block a minimal non-nil Header (or construct via model.NewBlock(...)) so Hash() is safe and the test actually exercises the gate behavior.

Suggested change

// A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do).

subtreeHash := createTestHash("liveness-gate-subtree")

block := &model.Block{

// A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do).

// Include a non-nil Header because blockWorker calls block.Hash(), which

// dereferences b.Header.Hash().

subtreeHash := createTestHash("liveness-gate-subtree")

block := &model.Block{

Header: &model.BlockHeader{},

Great catch — confirmed the panic locally (SIGSEGV inside BlockHeader.Bytes -> HashPrevBlock.CloneBytes). Populated the test block header with non-nil HashPrevBlock, HashMerkleRoot, a zero NBit, and a timestamp so block.Hash() serializes cleanly. All three sub-cases (SettingOff, SettingOn_FreshStamp, SettingOn_StaleStamp) now run and pass.

Copilot · 2026-04-17T04:21:16Z

+
+	// prometheusSubtreeOnlyTxMiss counts per-tx fetch fallbacks during the subtree-only
+	// path. A spike on a node configured with AssumeTxsBroadcastToAllNodes indicates
+	// upstream trouble (blaster health, peer connectivity, or misconfiguration).
+	prometheusSubtreeOnlyTxMiss prometheus.Counter


prometheusSubtreeOnlyTxMiss is declared and registered, but there are no increments/references anywhere in the codebase. As-is, the metric will stay at 0 and can mislead operators/dashboards. Either wire it into the per-tx fetch fallback path for the subtree-only flow (where the miss is detected) or remove it until it is actually emitted.

Suggested change

// prometheusSubtreeOnlyTxMiss counts per-tx fetch fallbacks during the subtree-only

// path. A spike on a node configured with AssumeTxsBroadcastToAllNodes indicates

// upstream trouble (blaster health, peer connectivity, or misconfiguration).

prometheusSubtreeOnlyTxMiss prometheus.Counter

Good point — the metric was forward-looking (the PR description mentions it, but the wiring into the per-tx fetch path in the validator didn't land with this PR). Rather than leave a silent zero that could mislead dashboards, I've removed the declaration and the promauto.NewCounter registration. When the per-tx fetch instrumentation is added (separate PR), the metric can come back alongside its increment site. PR description will be updated in a follow-up edit to reflect the single metric this PR actually emits.

- Patch protoc header in blockchain_api.pb.go / _grpc.pb.go from v6.33.4 to v7.34.1 to match the rest of the repo. Stub content is otherwise identical; only the header-comment versions change. - get_blocks_test.go: TestBlockWorker_LivenessGate was panicking at runtime because block.Hash() → BlockHeader.Bytes() dereferences HashPrevBlock/HashMerkleRoot/Bits. Populate a minimal non-nil Header so all three sub-cases (SettingOff, SettingOn_FreshStamp, SettingOn_StaleStamp) run and pass. - metrics.go: remove the unused prometheusSubtreeOnlyTxMiss counter. It was declared and registered but never incremented anywhere, which would silently show zero on dashboards and mislead operators. Can be added back when the per-tx fetch wiring lands in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T04:56:28Z

+	// Stamping errors are informational — do not fail validation on a stamp failure.
+	if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
+		u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)


BlockFound now calls ReportPeerBlockHeaderSeen unconditionally. This introduces an extra blockchain-service RPC on the hot path even when SubtreeValidation.AssumeTxsBroadcastToAllNodes is false (the default), which contradicts the PR claim that the default is a no-op and also increases runtime overhead for mainnet/testnet. Consider gating this call behind the setting (and possibly also requiring LivenessWindow > 0) so the additional RPC + in-memory stamp writes only occur when the optimization is enabled.

Suggested change

// Stamping errors are informational — do not fail validation on a stamp failure.

if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {

u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)

// Only do this when subtree liveness optimisation is enabled; otherwise this

// would add an unnecessary blockchain-service RPC on the BlockFound hot path.

// Stamping errors are informational — do not fail validation on a stamp failure.

if u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes && u.settings.SubtreeValidation.LivenessWindow > 0 {

if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {

u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)

}

Good catch — gated the stamp call behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0 (matching your suggestion). Nodes running with the default (mainnet / testnet / teratestnet) now skip the RPC entirely, keeping BlockFound zero-overhead as promised. Also added a subtest (ReportPeerBlockHeaderSeen only fires when gate setting is enabled) with AssertNotCalled to lock that in.

Copilot · 2026-04-17T04:56:29Z

+	prometheusLivenessGateDecision = promauto.NewCounterVec(
+		prometheus.CounterOpts{
+			Namespace: "teranode",
+			Subsystem: "subtreevalidation",
+			Name:      "livenessgate_decision_total",
+			Help:      "Count of liveness-gate decisions by outcome (subtreeonly, subtreedata, notfound, err).",
+		},
+		[]string{"decision"},
+	)


The PR description mentions a second new Prometheus counter (teranode_subtreevalidation_subtreeonly_tx_miss_total), but the code in this PR only registers teranode_subtreevalidation_livenessgate_decision_total. Please update the PR description (or add the missing metric/increment site) so operators don’t look for a metric that won’t exist after merge.

Done — updated the PR description to mention only the single livenessgate_decision_total counter this PR actually emits. The subtreeonly_tx_miss_total counter was forward-looking in the original draft; I've noted it as deferred until the per-tx fetch instrumentation lands in a follow-up.

Block validation's sequential revalidation step (end of CheckBlockSubtrees) did a single pass through the list of subtrees and returned on the first validation failure. This breaks for blocks with many cross-subtree transaction dependencies: if subtree B spends an output created by a tx in subtree A, and B is attempted before A finishes populating the internal tx cache, B fails with TX_MISSING_PARENT even though the parent becomes available as soon as A's revalidation completes. Symptom observed on teratestnet: block 15631 (9,216 txs, 1,305 with cross-subtree parents) consistently failed with: [processTransactionsInLevels] Completed processing with 1305 errors, 1305 transactions added to orphanage The orphanage kept the children but block validation is all-or-nothing — they could never be retried, so teratestnet stalled indefinitely at 15,630. Fix: replace the single-pass revalidation with a convergence loop that retries still-failing subtrees until either every subtree succeeds or an iteration makes no progress (meaning the failures are real, not ordering artefacts). Maximum iterations is bounded by the number of subtrees since each iteration must make at least one unit of progress to continue. This is the "Phase 2 convergence loop" from PR bsv-blockchain#646, which I extracted into this standalone PR as recommended in the spec for PR bsv-blockchain#715. Testing: - make lint: 0 issues - go test -tags testtxmetacache ./services/subtreevalidation/... -run TestCheckBlockSubtrees: 22 passed - Integration test: deploy to teratestnet and verify sync past 15631 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T05:59:27Z

+// The first stamp for a given hash wins; repeated inserts are no-ops. This matches
+// the semantic "when did we first learn about this header?"


The “first stamp wins; repeated inserts are no-ops” description is only true while the entry is still within the TTL. After TTL expiry the map reports the key as not found and a subsequent stamp will overwrite. Consider clarifying this in the comment to avoid implying global write-once semantics across the process lifetime.

Right — updated both the struct comment and the stamp method's doc to make the TTL boundary explicit: "within a single TTL window the first stamp wins; once the TTL elapses the entry has been evicted and a new stamp records the new observation rather than reviving a stale one." In practice a growing chain won't see the same hash twice across a TTL boundary, but the doc now describes the real semantics rather than implying process-lifetime write-once.

Copilot · 2026-04-17T05:59:27Z

 // NewLocalClient creates a new LocalClient instance with the provided dependencies.
 func NewLocalClient(logger ulogger.Logger, tSettings *settings.Settings, store blockchain.Store, subtreeStore blob.Store, utxoStore utxo.Store) (ClientI, error) {
+	// Settings may be nil in some test contexts; fall back to the default zero
+	// LivenessWindow so receivedAtTTL clamps to its 30m floor.
+	var livenessWindow time.Duration
+	if tSettings != nil {
+		livenessWindow = tSettings.SubtreeValidation.LivenessWindow
+	}
+
 	return &LocalClient{
 		logger:       logger,
 		settings:     tSettings,
 		store:        store,
 		subtreeStore: subtreeStore,
 		utxoStore:    utxoStore,
+		receivedAt:   newReceivedAtStore(receivedAtTTL(livenessWindow)),
 		subscribers:  make(map[string]chan *blockchain_api.Notification),
 	}, nil


LocalClient now allocates a receivedAtStore via expiringmap.New(...), which starts a background ticker goroutine for any non-zero TTL. LocalClient doesn’t expose any Stop/Close to release that goroutine, so short-lived LocalClients (common in tests/benchmarks) will leak goroutines for the lifetime of the process. Consider adding a Stop/Close method on LocalClient that calls c.receivedAt.Stop() (even if not part of ClientI), and using it in test cleanups or any code that constructs LocalClient transiently.

Added (*LocalClient).Stop() (nil-safe, idempotent) that forwards to receivedAt.Stop() so transient LocalClients in tests/benchmarks can defer client.Stop() without leaking the background cleanup goroutine. Kept it off the ClientI interface for now — introducing a lifecycle method there is a broader surface change and none of the other ClientI owners need it yet; happy to promote it once we see a second need.

Copilot · 2026-04-17T05:59:27Z

+			blockchainClient:              mockBlockchainClient,
+		}
+		defer bv.blockExistsCache.Stop()
+		require.NoError(t, bv.SetBlockExists(&hash)) // short-circuits after stamp check


The comment on bv.SetBlockExists says this “short-circuits after stamp check”, but BlockFound returns immediately on exists==true (before ReportPeerBlockHeaderSeen is reached). Please update the comment to match the actual control flow so the test intent stays clear.

Suggested change

require.NoError(t, bv.SetBlockExists(&hash)) // short-circuits after stamp check

require.NoError(t, bv.SetBlockExists(&hash)) // causes BlockFound to return early on the exists path, before ReportPeerBlockHeaderSeen

Good catch — updated the comment. Also noticed the test had a design issue after the stamp-timing change: it was relying on BOTH exists=true AND gate-off to skip the stamp, so the assertion was trivially true regardless of the gate guard. Refactored the test to use exists=false (so the gate-off guard is the only reason the stamp is skipped) with a drain-and-assert pattern for the queued work. The assertion now actually tests what the name says.

- received_at.go: clarify TTL-bounded write-once semantics. Update both the struct doc and the stamp method doc to explicitly describe what happens across a TTL boundary (previous entry evicted; new stamp records new observation rather than reviving a stale one) instead of implying process-lifetime write-once. - LocalClient.go: add (*LocalClient).Stop() (nil-safe, idempotent) forwarding to receivedAt.Stop(). Transient LocalClients in tests and benchmarks can now defer client.Stop() to release the background cleanup goroutine owned by the receivedAt ExpiringMap. - Server_test.go: fix the "ReportPeerBlockHeaderSeen only fires when gate is enabled" subtest. Previously it set exists=true, so the exists-short-circuit skipped the stamp regardless of the gate guard (assertion was trivially true). Refactored to use exists=false with a drain-the-queue pattern so the gate-off guard is the only reason the stamp is skipped — the test now actually validates its stated behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T06:30:17Z

+		time.Sleep(10 * time.Millisecond)
+		require.Equal(t, 1, len(server.blockFoundCh))
+		<-server.blockFoundCh


Using a fixed 10ms sleep + len(server.blockFoundCh) to wait for the goroutine in BlockFound to enqueue work is timing-dependent and can be flaky on loaded/slow CI. Prefer synchronizing deterministically (e.g., select on server.blockFoundCh with a timeout, or require.Eventually until an item is received) instead of sleeping and checking len().

Suggested change

time.Sleep(10 * time.Millisecond)

require.Equal(t, 1, len(server.blockFoundCh))

<-server.blockFoundCh

select {

case <-server.blockFoundCh:

case <-time.After(time.Second):

t.Fatal("timed out waiting for blockFound work to be queued")

}

Agreed — replaced the sleep-and-check-len pattern with select { case <-server.blockFoundCh: ... case <-time.After(time.Second): t.Fatal(...) } as suggested. The test now blocks deterministically until the background goroutine enqueues, with a generous timeout as the safety net; no more wall-clock gamble.

- Server_test.go: replace the sleep-and-check-len synchronization in the "gate off → no stamp" subtest with a deterministic select { case <-blockFoundCh: ... case <-time.After(1s): t.Fatal } pattern as Copilot suggested. No more wall-clock gamble on loaded CI runners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T06:55:54Z

+			srv := &Server{
+				logger:           ulogger.TestLogger{},
+				settings:         tSettings,
+				blockchainClient: mockClient,
+			}


This test constructs a Server directly without calling InitPrometheusMetrics() / srv.Init(...). Since Server.ShouldUseSubtreeOnlyPath increments prometheusLivenessGateDecision, running this test in isolation can panic on a nil CounterVec (it currently relies on metrics having been initialized by some other test’s side effects). Initialize the metrics explicitly in this test (or call InitPrometheusMetrics() inside ShouldUseSubtreeOnlyPath with a sync.Once) so the test is order-independent.

Great catch — confirmed this test passes today only because SubtreeValidation_test.go runs the package-level sync.Once first and leaves the counters registered as a side effect. Added an explicit InitPrometheusMetrics() call at the top of TestShouldUseSubtreeOnlyPath_TruthTable (it's idempotent via sync.Once). The test now works in isolation (go test -run TestShouldUseSubtreeOnlyPath_TruthTable) and is order-independent.

- livenessgate_test.go: call InitPrometheusMetrics() at the top of TestShouldUseSubtreeOnlyPath_TruthTable so it works in isolation. Previously the test was relying on another test's side effect (SubtreeValidation_test.go triggering the package-level sync.Once first). InitPrometheusMetrics is idempotent, so the explicit call is safe regardless of test order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T07:20:10Z

 				// PHASE 2: Exact pre-allocation
 				subtreeTxs[subtreeIdx] = make([]*bt.Tx, 0, subtreeToCheck.Length())

+				if subtreeOnly {
+					// Gate is on and the block's header was received recently — skip
+					// the subtreeData download entirely. subtreeTxs[subtreeIdx] stays
+					// empty, so the block-level batch processTransactionsInLevels call
+					// downstream is skipped (batchTxCount == 0). Transaction resolution
+					// happens later in validation via the subtree manifest +
+					// UTXO/TxMetaCache, with per-tx fetch as the fallback for real
+					// misses.
+					return nil


In the subtree-only gated path, this pre-allocation still allocates an underlying array with cap=subtreeToCheck.Length() even though the worker immediately returns. For large subtrees this can allocate a lot of memory per missing subtree and partially defeats the purpose of skipping subtreeData downloads. Consider checking subtreeOnly before this make(...), or allocating a nil/zero-cap slice in the gated path and only doing the exact pre-allocation when you’re going to populate it.

Right — moved the subtreeOnly early-return ahead of the make([]*bt.Tx, 0, subtreeToCheck.Length()) pre-allocation. On the gated path subtreeTxs[subtreeIdx] now stays as the zero-value nil slice, saving subtreeToCheck.Length() capacity worth of memory per missing subtree. The comment downstream still holds (nil slice → batchTxCount == 0 → processTransactionsInLevels skipped).

Copilot · 2026-04-17T07:20:11Z

 		AppCtx:                        ctx,
 		blocksFinalKafkaAsyncProducer: blocksFinalKafkaAsyncProducer,
 		batchTokens:                   make(map[string]*blobDeletionBatchToken),
+		receivedAt:                    newReceivedAtStore(receivedAtTTL(livenessWindow)),
 	}


receivedAt is initialized unconditionally. With the default config (AssumeTxsBroadcastToAllNodes=false) the service will still allocate/maintain this map and start a background cleanup ticker, even though the gate is never consulted. Consider only creating the receivedAtStore when the liveness gate feature is enabled (and/or LivenessWindow > 0), to keep the default-off deployment closer to a true no-op and avoid unnecessary memory growth during fast catchup.

Done — the receivedAt store is now only allocated when the gate is enabled at construction (AssumeTxsBroadcastToAllNodes && LivenessWindow > 0), in both Server.New and NewLocalClient. Made receivedAtStore.stamp and .lookup nil-safe so call sites don't need extra guards beyond the existing gate-setting checks they already perform. Default-off deployments now allocate zero bytes and start zero goroutines for this feature — true no-op.

Copilot · 2026-04-17T07:20:11Z

+	// Record first-seen time for the subtree-only liveness gate as a
+	// belt-and-braces fallback. The primary stamp happens earlier in
+	// blockvalidation.BlockFound via ReportPeerBlockHeaderSeen, but this write
+	// covers paths that reach AddBlock without having gone through BlockFound
+	// (e.g. locally mined blocks). Stamping is write-once, so double writes are
+	// cheap no-ops.
+	b.receivedAt.stamp(block.Hash())


AddBlock stamps receivedAt for every stored block, regardless of whether the liveness gate is enabled. In a fast catchup this can grow the in-memory map (TTL floor is 30m) on nodes that aren’t using the optimization. Consider gating this stamp behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0, or making the store/stamp conditional (while still stamping locally-mined blocks when the feature is enabled).

Suggested change

// Record first-seen time for the subtree-only liveness gate as a

// belt-and-braces fallback. The primary stamp happens earlier in

// blockvalidation.BlockFound via ReportPeerBlockHeaderSeen, but this write

// covers paths that reach AddBlock without having gone through BlockFound

// (e.g. locally mined blocks). Stamping is write-once, so double writes are

// cheap no-ops.

b.receivedAt.stamp(block.Hash())

// The primary first-seen stamp for the subtree-only liveness gate happens

// earlier in blockvalidation.BlockFound via ReportPeerBlockHeaderSeen.

// Avoid stamping every stored block here, as this path is exercised during

// fast catchup and can otherwise grow the in-memory receivedAt map even when

// the liveness optimization is not in use.

Agreed — gated the AddBlock stamp behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0 in both Server.AddBlock and LocalClient.AddBlock. Combined with the conditional store allocation (other thread), nodes that don't use the optimisation now allocate zero receivedAt entries during fast catchup. The stamp still fires for locally mined blocks on nodes where the gate is enabled (they reach AddBlock without going through BlockFound), which is the original belt-and-braces intent.

Make default-off deployments a true zero-allocation no-op for the liveness-gate feature: - check_block_subtrees.go: move the subtreeOnly early-return ahead of the subtreeTxs[i] = make(..., subtreeToCheck.Length()) pre-allocation so the gated path doesn't allocate a per-subtree tx slice. The remaining nil-slice still makes the block-level batch processTransactionsInLevels skip (batchTxCount == 0). - received_at.go: nil-safe stamp() and lookup() so call sites can leave the store unallocated without extra guards. - Server.go / LocalClient.go (New + AddBlock): allocate the receivedAt store only when the gate is enabled at construction, and gate the AddBlock belt-and-braces stamp behind the same setting. Fast catchup on default-off nodes no longer grows an in-memory map. - server_test.go (TestGetHeaderReceivedAt_StampedOnAddBlock): enable the gate setting + allocate the store for this test since it was relying on the always-on default behaviour we've now removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T07:58:22Z

+	// DecisionSubtreeData — gate disabled by setting, or stamp is stale.
+	// Caller must fall back to the subtreeData path.


Doc mismatch: the comment for DecisionSubtreeData says it covers “gate disabled by setting, or stamp is stale”, but Decide() also returns DecisionSubtreeData when window <= 0. Please update the DecisionSubtreeData comment to include the non-positive window short-circuit (so future readers don’t assume these cases are distinguished elsewhere).

Suggested change

// DecisionSubtreeData — gate disabled by setting, or stamp is stale.

// Caller must fall back to the subtreeData path.

// DecisionSubtreeData — gate disabled by setting, liveness window is

// non-positive, or stamp is stale. Caller must fall back to the

// subtreeData path.

Copilot · 2026-04-17T07:58:23Z

+	// Once-per-block liveness gate. If true, every per-subtree worker in this
+	// invocation skips the subtreeData download. In that path the per-subtree
+	// worker never populates subtreeTxs[subtreeIdx], so the block-level batch
+	// processTransactionsInLevels call is skipped entirely (batchTxCount == 0).
+	// Transaction resolution happens downstream in validation via the subtree
+	// manifest + UTXO/TxMetaCache, with per-tx fetch as the fallback for real
+	// misses (validator resolves each referenced tx hash on demand).
+	subtreeOnly := u.ShouldUseSubtreeOnlyPath(ctx, block.Hash())
+


The liveness gate is evaluated before the tracing span is started. That means the GetHeaderReceivedAt RPC (and any debug logging/metrics side-effects) won’t be associated with the CheckBlockSubtrees span, which makes it harder to debug gate behavior from traces. Consider starting the tracing span first, then computing subtreeOnly using the traced ctx.

Copilot · 2026-04-17T07:58:23Z

+		if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {
+			u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)


Potential nil deref: this calls u.blockchainClient.ReportPeerBlockHeaderSeen(...) without checking u.blockchainClient != nil. blockvalidation.New takes blockchainClient as a parameter but doesn’t guard against nil, and some tests previously built Server without setting this field. Please add a nil check (or call through the already-required client used by u.blockValidation) so enabling AssumeTxsBroadcastToAllNodes can’t crash the service if the client wasn’t wired.

Suggested change

if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {

u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)

if u.blockchainClient != nil {

if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil {

u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr)

}

} else {

u.logger.Debugf("[BlockFound][%s] skipping ReportPeerBlockHeaderSeen because blockchain client is not configured", hash.String())

…rapper The liveness gate is a feature-specific decision helper, not a general teranode utility — both callers (subtreevalidation per-subtree gating and blockvalidation catchup prefetch) consume the same one feature, driven by SubtreeValidation.AssumeTxsBroadcastToAllNodes. - Move util/livenessgate/ → services/subtreevalidation/livenessgate/ so the package lives alongside the setting that owns it. - Drop the package-level ShouldUseSubtreeOnlyPath convenience. Callers use Decide directly and compare against DecisionSubtreeOnly (exactly one extra line at the one site that used the wrapper). - Drop the three `var _ livenessgate.Client = (*T)(nil)` assertions in services/blockchain/. They created a reverse-layering import (blockchain depending on a subtreevalidation subpackage) and were redundant — real call sites already pass the blockchain client to livenessgate.Decide, so the compiler checks the interface match at the call site. Behaviour unchanged. The Decision enum + metric labels are kept; the (*Server).ShouldUseSubtreeOnlyPath wrapper in subtreevalidation is kept (it bundles metric emission and logging over Decide). Net: -37 lines, one fewer package, no cross-service dependency inversion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-04-22T12:23:15Z

Quality Gate failed

Failed conditions
76.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

…veness-gate

freemans13 · 2026-04-24T00:36:43Z

Superseded by #745 — self-discovering adaptive gate replaces the operator-configured feature flag + wall-clock liveness gate. The new design deliberately avoids any wall-clock or FSM-state gating (enforced by a source-scanning regression test), eliminating the class of bug that caused PR #598 to be reverted via #647.

freemans13 and others added 12 commits April 16, 2026 17:19

feat(subtreevalidation): add AssumeTxsBroadcastToAllNodes and Livenes…

bd5ddbb

…sWindow settings Default off; no behavior change on their own. Used by subsequent tasks to gate the subtree-only validation path.

feat(blockchain): add receivedAtStore for per-header first-seen times…

638f156

…tamps In-memory, TTL-bounded store used by the subtree-only liveness gate (next commit). First stamp wins; subsequent inserts are no-ops. Concurrent-safe.

feat(blockchain): add GetHeaderReceivedAt RPC to proto

afa2094

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(blockchain): stamp ReceivedAt in AddBlock + implement GetHeaderR…

b63eae6

…eceivedAt Blockchain.AddBlock now records the first-seen wall-clock time for each successfully stored block. GetHeaderReceivedAt exposes the stamp; absent or TTL-expired entries return found=false.

test(blockchain): add GetHeaderReceivedAt tests

cfc206c

Two tests verifying stamp-on-AddBlock and absent-hash behaviour.

chore(blockchain): add compile-time ClientI interface assertions

a5ba990

Catches future drift in ClientI before runtime. Addresses code review feedback on commit 8f2fbef. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(blockchain): assert livenessgate.Client at compile time

f333f1e

Catches future drift if any ClientI implementer loses GetHeaderReceivedAt. Addresses code review feedback on commit d35934d. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into stu/subtree-only-li…

271852b

…veness-gate

freemans13 requested a review from Copilot April 17, 2026 02:56

Copilot started reviewing on behalf of freemans13 April 17, 2026 02:57 View session