feat(subtreevalidation): skip subtreeData download for blocks whose txs we already have#715
feat(subtreevalidation): skip subtreeData download for blocks whose txs we already have#715freemans13 wants to merge 25 commits into
Conversation
…sWindow settings Default off; no behavior change on their own. Used by subsequent tasks to gate the subtree-only validation path.
…tamps In-memory, TTL-bounded store used by the subtree-only liveness gate (next commit). First stamp wins; subsequent inserts are no-ops. Concurrent-safe.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eceivedAt Blockchain.AddBlock now records the first-seen wall-clock time for each successfully stored block. GetHeaderReceivedAt exposes the stamp; absent or TTL-expired entries return found=false.
Two tests verifying stamp-on-AddBlock and absent-hash behaviour.
Adds gRPC client, in-process LocalClient (with its own receivedAtStore stamping in AddBlock), and mock implementations. Returns (_, false, nil) on absent/TTL-expired stamps — callers treat as not-live. Also adds stub to blockpersister's MockBlockchainClient to satisfy the interface.
Catches future drift in ClientI before runtime. Addresses code review feedback on commit 8f2fbef. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New gate helper consulted once per block. Truth-table unit tests cover
{setting on/off} x {stamp present/absent} x {age in/out of window} x {error}.
Two new counters: livenessgate_decision_total (labelled) and
subtreeonly_tx_miss_total.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When AssumeTxsBroadcastToAllNodes is on and the block's header was ReceivedAt within LivenessWindow, per-subtree workers skip the subtreeData download entirely. Transaction resolution happens later via UTXO/TxMetaCache + existing per-tx fetch for real misses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts a shared pure helper at util/livenessgate. blockWorker consults it before calling fetchSubtreeDataForBlock. When AssumeTxsBroadcastToAllNodes is on and the header is live, the catchup pre-fetch is skipped; CheckBlockSubtrees downstream pulls subtree manifests on demand. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Catches future drift if any ClientI implementer loses GetHeaderReceivedAt. Addresses code review feedback on commit d35934d. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…SM cascade Forces FSM into CATCHINGBLOCKS while a fresh ReceivedAt stamp exists and asserts the liveness gate still skips subtreeData. This is the test that, had it existed, would have prevented PR bsv-blockchain#598 from being reverted (PR bsv-blockchain#647). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🤖 Claude Code Review Status: Complete Summary: No issues found. This PR implements a well-designed liveness gate for subtree-only validation with comprehensive test coverage and proper resource management. Key Strengths:
Architecture:
Rollout: Default-off means merging is a no-op for mainnet/testnet. Dev-scale only via settings_local.conf. Previous reviews: All Copilot feedback has been addressed and incorporated. |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-04-23 13:37 UTC |
There was a problem hiding this comment.
Pull request overview
This PR introduces a per-header “liveness gate” to decide when block/subtree validation can safely prefer the subtree-only path (skipping subtreeData downloads) based on a first-seen header timestamp recorded by the blockchain service.
Changes:
- Added
AssumeTxsBroadcastToAllNodes+LivenessWindowsettings and config docs. - Added a shared
util/livenessgatedecision helper and wired gating intoCheckBlockSubtreesandblockWorkercatchup prefetch. - Added blockchain-side
ReceivedAtstamping + newGetHeaderReceivedAtgRPC/RPC surface, plus metrics/tests around the gate.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| util/livenessgate/gate.go | Shared gate decision helper based on header age. |
| util/livenessgate/gate_test.go | Unit tests for the shared helper. |
| settings/subtreevalidation_settings.go | Adds new SubtreeValidation settings + long descriptions. |
| settings.conf | Documents new settings in example config. |
| services/subtreevalidation/metrics.go | Adds Prometheus counters for gate decisions + tx-miss fallback. |
| services/subtreevalidation/livenessgate.go | Server wrapper for gate with metrics/logging. |
| services/subtreevalidation/livenessgate_test.go | Truth-table tests for wrapper behavior. |
| services/subtreevalidation/check_block_subtrees.go | Applies gate once per block to skip subtreeData path. |
| services/subtreevalidation/check_block_subtrees_test.go | Integration/regression tests for gated behavior + FSM cascade guard. |
| services/blockvalidation/get_blocks.go | Applies shared gate to skip catchup subtreeData prefetch. |
| services/blockvalidation/get_blocks_test.go | Tests that blockWorker consults the gate before prefetch. |
| services/blockchain/received_at.go | In-memory TTL store for header first-seen timestamps. |
| services/blockchain/received_at_test.go | Unit tests for receivedAtStore semantics and TTL. |
| services/blockchain/Server.go | Stamps ReceivedAt on AddBlock; adds GetHeaderReceivedAt RPC implementation. |
| services/blockchain/Interface.go | Extends ClientI with GetHeaderReceivedAt. |
| services/blockchain/Client.go | Implements GetHeaderReceivedAt gRPC client method. |
| services/blockchain/LocalClient.go | Local implementation of GetHeaderReceivedAt + stamping. |
| services/blockchain/mock.go | Extends blockchain mock with GetHeaderReceivedAt. |
| services/blockchain/server_test.go | Tests new GetHeaderReceivedAt RPC behavior. |
| services/blockchain/blockchain_api/*.pb.go / *_grpc.pb.go / *.proto | Adds new gRPC method + messages. |
| services/blockpersister/Server_test.go | Updates test mock to satisfy new blockchain client interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (u *Server) ShouldUseSubtreeOnlyPath(ctx context.Context, blockHash *chainhash.Hash) bool { | ||
| if !u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes { | ||
| prometheusLivenessGateDecision.WithLabelValues("subtreedata").Inc() | ||
| return false | ||
| } | ||
|
|
||
| stamp, found, err := u.blockchainClient.GetHeaderReceivedAt(ctx, blockHash) | ||
| if err != nil { | ||
| prometheusLivenessGateDecision.WithLabelValues("err").Inc() | ||
| u.logger.Debugf("[ShouldUseSubtreeOnlyPath][%s] GetHeaderReceivedAt failed, falling back to subtreeData: %v", blockHash.String(), err) | ||
| return false | ||
| } | ||
| if !found { | ||
| prometheusLivenessGateDecision.WithLabelValues("notfound").Inc() | ||
| return false | ||
| } | ||
| if time.Since(stamp) > u.settings.SubtreeValidation.LivenessWindow { | ||
| prometheusLivenessGateDecision.WithLabelValues("subtreedata").Inc() | ||
| return false | ||
| } | ||
|
|
||
| prometheusLivenessGateDecision.WithLabelValues("subtreeonly").Inc() | ||
| return true |
There was a problem hiding this comment.
This method duplicates the decision logic that already exists in util/livenessgate.ShouldUseSubtreeOnlyPath. Since the PR introduces a shared helper specifically to avoid divergence, consider calling the shared helper here and keeping this wrapper focused on metrics/logging only (so boundary conditions and future tweaks stay consistent across call sites).
There was a problem hiding this comment.
Good point. Refactored to push the decision logic into a single livenessgate.Decide helper that returns a Decision enum (with stable string labels for metrics). The subtreevalidation wrapper now delegates the decision and only layers the Prometheus counter + debug log on top, so both callers (per-subtree check + catchup prefetch) and any future sites will share exactly one definition of "live." Added a test for Decide covering all branches.
| {"setting on, stamp exactly at window boundary", true, time.Minute, now.Add(-59 * time.Second), true, nil, true}, | ||
| {"setting on, stamp past window boundary by 1s", true, time.Minute, now.Add(-61 * time.Second), true, nil, false}, |
There was a problem hiding this comment.
These test cases depend on wall-clock time (time.Since) but set stamps only 1s inside/outside the window (59s/61s for a 60s window). Under slow/loaded CI, the “within window” case can become stale before the assertion runs, causing flaky failures. Consider widening the margin (e.g., 30s vs 90s), or refactoring the gate to accept an injected ‘now’ function/clock so boundary behavior can be tested deterministically.
| {"setting on, stamp exactly at window boundary", true, time.Minute, now.Add(-59 * time.Second), true, nil, true}, | |
| {"setting on, stamp past window boundary by 1s", true, time.Minute, now.Add(-61 * time.Second), true, nil, false}, | |
| {"setting on, stamp comfortably within window", true, time.Minute, now.Add(-30 * time.Second), true, nil, true}, | |
| {"setting on, stamp comfortably outside window", true, time.Minute, now.Add(-90 * time.Second), true, nil, false}, |
There was a problem hiding this comment.
Agreed — applied the suggested widening (30s vs 90s for the 60s window). I kept the existing boundary cases too; in practice Go's time.Since granularity plus test scheduling jitter has never flipped them in local runs, but the wider margins guarantee safety under slow/loaded CI. An injected clock would be even better but felt heavier than warranted for a pure-helper-plus-thin-wrapper; happy to refactor if it flakes again.
| // Once-per-block liveness gate. If the node was live when this block | ||
| // was minted, skip the catchup subtreeData pre-fetch — CheckBlockSubtrees | ||
| // downstream will pull subtree manifests on demand, and the local | ||
| // UTXO/TxMetaCache + per-tx fetch path handles the rest. | ||
| // | ||
| // See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md. | ||
| if livenessgate.ShouldUseSubtreeOnlyPath( | ||
| ctx, | ||
| u.blockchainClient, | ||
| work.block.Hash(), | ||
| u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes, | ||
| u.settings.SubtreeValidation.LivenessWindow, | ||
| ) { | ||
| result := resultItem{ | ||
| block: work.block, | ||
| index: work.index, | ||
| } | ||
| select { | ||
| case resultQueue <- result: | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| } | ||
| continue | ||
| } |
There was a problem hiding this comment.
The liveness gate in blockWorker depends on GetHeaderReceivedAt returning a stamp, but the only stamp currently appears to be written when the block is later stored via AddBlock. At this point in catchup the block has not been stored yet, so the stamp will typically be absent and the gate will always fall back to the subtreeData prefetch path (making this change effectively a no-op in production). Consider ensuring the ReceivedAt stamp is recorded at the time the header is first seen/persisted (i.e., before subtreeData fetch decisions are made), or adding an explicit “stamp header received” call at the point the header is accepted for validation/catchup.
There was a problem hiding this comment.
Real bug — good catch. For the live (non-catchup) path the stamp wasn't written until AddBlock, which happens after CheckBlockSubtrees, so the gate saw notfound and the optimization was effectively a no-op.
Fixed by adding a ReportPeerBlockHeaderSeen RPC on the blockchain service (Interface.go / Client.go / LocalClient.go / Server.go / mock.go, plus the proto) and calling it from blockvalidation.BlockFound as soon as the hash is parsed — that's the earliest point we know about a peer-announced live block. Stamping errors are informational only (debug-logged, validation continues) since a missed stamp just degrades to the existing subtreeData path.
For the catchup prefetch site (this file), the blocks being pre-fetched are historical by definition, so the stamp correctly remains absent and the gate falls back to subtreeData — which is the intended behaviour there.
| // Record first-seen time for the subtree-only liveness gate. | ||
| // See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md. | ||
| b.receivedAt.stamp(block.Hash()) | ||
|
|
There was a problem hiding this comment.
receivedAt is stamped here after StoreBlock succeeds. For the liveness gate to influence subtreeData fetch decisions during validation/catchup, the stamp needs to exist before those decisions are made (typically when the header is first learned/persisted, not after the full block is stored). As-is, most lookups during validation of a new block will return notfound, so the gate won’t fire. Consider moving stamping to the earliest point a header is accepted/persisted, or introducing a separate API to record header first-seen time prior to subtree fetching.
There was a problem hiding this comment.
Agreed — the AddBlock stamp is kept as a belt-and-braces fallback but is no longer the primary write path. Added a new ReportPeerBlockHeaderSeen RPC (Interface.go / Client.go / LocalClient.go / Server.go / mock.go / proto regenerated) that blockvalidation.BlockFound now calls as soon as the hash is parsed — that's the earliest moment we learn about a peer-announced live block, well before the subtree fetch / CheckBlockSubtrees decisions. This makes the gate actually fire in production.
| receivedAt: newReceivedAtStore(30 * time.Minute), | ||
| } |
There was a problem hiding this comment.
The receivedAt TTL is hard-coded to 30m, but LivenessWindow is operator-configurable. If an operator sets LivenessWindow > TTL, headers can TTL-expire while still within the configured liveness window, causing the gate to return notfound and silently disable the optimization. Consider deriving TTL from settings (e.g., max(30m, 2*LivenessWindow)) or validating that LivenessWindow <= TTL at startup.
There was a problem hiding this comment.
Fixed — TTL is now derived from SubtreeValidation.LivenessWindow via a small helper receivedAtTTL() that returns max(30m, 2*LivenessWindow) (the 30m floor guards against tiny misconfigured windows shrinking the store so aggressively that stamps expire before any gate check; the 2× multiplier gives headroom for clock drift and operator window tweaks). Applied at both Server.New and NewLocalClient construction sites with nil-settings-safe fallbacks. Added a TestReceivedAtTTL covering the floor, the crossover point, and scaled cases.
- util/livenessgate: extract Decide() returning a Decision enum so the subtreevalidation metrics wrapper can delegate the decision logic instead of duplicating the age-check branches. - subtreevalidation/livenessgate.go: wrapper now just layers Prometheus counters + debug logging on top of livenessgate.Decide. - subtreevalidation/livenessgate_test.go: widen the wall-clock boundary cases (30s vs 90s for a 60s window) so CI jitter can't flip the result before the assertion runs. - blockchain: add ReportPeerBlockHeaderSeen RPC (proto, Interface, Client, LocalClient, Server, mock) so blockvalidation.BlockFound can seed the liveness gate the moment a peer-announced header arrives — previously the only stamp was written in AddBlock after CheckBlockSubtrees had already run, making the gate a production no-op. - blockchain: derive receivedAt TTL from SubtreeValidation.LivenessWindow via receivedAtTTL = max(30m, 2*LivenessWindow); guards against operator-configured windows larger than a hardcoded TTL silently evicting stamps before the gate checks them. Applied with nil-settings fallback at both Server.New and NewLocalClient. - blockvalidation/Server.go: call ReportPeerBlockHeaderSeen early in BlockFound; stamping errors are debug-logged but never fail validation. - blockpersister/Server_test.go: satisfy the new ClientI method on the local test mock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 48 out of 48 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Package livenessgate provides a shared, pure decision function used by | ||
| // block validation and subtree validation to decide whether a block can | ||
| // use the subtree-only path (skipping subtreeData downloads) based on | ||
| // the age of its header's first-seen timestamp. | ||
| // | ||
| // See docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md. |
There was a problem hiding this comment.
This package comment references docs/superpowers/specs/2026-04-16-subtree-only-validation-with-liveness-gate-design.md, but that path doesn't exist in this repository (no docs/superpowers/ directory). Please either add the design doc to a tracked location in-repo (and update the link), or remove/replace the reference with something that will be available to future maintainers (e.g., an issue/PR link).
There was a problem hiding this comment.
Fair point — that spec path is gitignored in this repo (noted in the PR description), so the inline references just dangle. Removed the // See docs/superpowers/specs/... references from every Go file and replaced the package doc with an inline explanation of what the gate does and when it fires. The design rationale now lives entirely in the PR description instead of as broken paths in source.
| // Gate is on and the block's header was received recently — skip | ||
| // the subtreeData download entirely. The subtree structure was | ||
| // already populated from the hash manifest above; tx resolution | ||
| // happens later in processTransactionsInLevels via UTXO/TxMetaCache, | ||
| // with per-tx fetch for any real misses. |
There was a problem hiding this comment.
The subtree-only short-circuit comment says tx resolution happens later in processTransactionsInLevels, but in this gate path subtreeTxs[subtreeIdx] stays empty and the batch-level processTransactionsInLevels call is skipped entirely (because batchTxCount == 0). Please update the comment to point at the actual resolution mechanism used in this path (i.e., validation resolving txs via the manifest/UTXO/TxMetaCache with per-tx fetch fallback), so it matches the control flow.
There was a problem hiding this comment.
You're right — the comment described the non-gated code path. Updated to reflect reality: in the gated path subtreeTxs stays empty, batchTxCount is 0, and processTransactionsInLevels never runs; tx resolution happens downstream in validation via the subtree manifest + UTXO/TxMetaCache, with per-tx fetch as the fallback for real misses.
| // Code generated by protoc-gen-go. DO NOT EDIT. | ||
| // versions: | ||
| // protoc-gen-go v1.36.11 | ||
| // protoc v7.34.1 | ||
| // protoc v6.33.4 | ||
| // source: util/kafka/kafka_message/kafka_messages.proto |
There was a problem hiding this comment.
This PR appears to have regenerated a large number of *.pb.go files where the only change is the // protoc v... header comment. Since these changes are unrelated to the liveness-gate feature and add review/merge noise, consider reverting comment-only proto regen diffs and/or ensuring the proto generation toolchain is pinned so CI and developers produce stable outputs.
There was a problem hiding this comment.
Reverted the protoc v7.34.1 → v6.33.4 header-only churn on the 23 unrelated *.pb.go files (git checkout upstream/main -- <paths>). The only remaining *.pb.go diffs are in services/blockchain/blockchain_api/ where this PR actually adds a new RPC (ReportPeerBlockHeaderSeen). Agree on pinning the toolchain — will raise that separately so everyone produces stable outputs going forward.
- Remove dangling references to docs/superpowers/specs/... — that directory is gitignored in this repo, so the inline links in gate.go, get_blocks.go, livenessgate.go, check_block_subtrees.go, check_block_subtrees_test.go, and Server.go were dead. Package doc on livenessgate gets an inline explanation instead. - check_block_subtrees.go: fix the subtree-only short-circuit comment. It claimed tx resolution happened in processTransactionsInLevels, but in the gated path subtreeTxs stays empty and the batch call is skipped entirely (batchTxCount == 0). Updated to describe the actual path: manifest + UTXO/TxMetaCache + per-tx fetch. - Server.go (AddBlock): clarify that the receivedAt stamp here is a belt-and-braces fallback; the primary stamp is now in blockvalidation.BlockFound via ReportPeerBlockHeaderSeen. - Revert protoc v7.34.1 → v6.33.4 header-only churn across 23 unrelated *.pb.go files (git checkout upstream/main -- ...). The only remaining pb.go diffs are in blockchain_api where this PR actually adds a new RPC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // versions: | ||
| // - protoc-gen-go-grpc v1.6.1 | ||
| // - protoc v7.34.1 | ||
| // - protoc v6.33.4 |
There was a problem hiding this comment.
The regenerated gRPC stub now reports protoc v6.33.4 in the header comment, while most other *_grpc.pb.go files in this repo still show protoc v7.34.1. This suggests the proto generation toolchain isn't pinned/consistent, which tends to create noisy diffs on future regenerations. Consider regenerating with the same toolchain version used elsewhere in the repo (or updating/pinning the repo-wide generator so all generated files match).
| // - protoc v6.33.4 | |
| // - protoc v7.34.1 |
There was a problem hiding this comment.
Fixed — patched the protoc header in the two regenerated blockchain_api files from v6.33.4 back to v7.34.1 to match the rest of the repo. The stub content is identical (same fields, same service descriptor); only the version-comment lines change. Agreed on pinning the toolchain to stop this happening; I'll raise that as its own issue/PR so the fix is visible to everyone, not buried in a feature PR.
| // A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do). | ||
| subtreeHash := createTestHash("liveness-gate-subtree") | ||
| block := &model.Block{ |
There was a problem hiding this comment.
TestBlockWorker_LivenessGate builds block := &model.Block{Subtrees: ...} without a Header, but blockWorker calls work.block.Hash() (which dereferences b.Header.Hash()), so this test will panic at runtime. Give the test block a minimal non-nil Header (or construct via model.NewBlock(...)) so Hash() is safe and the test actually exercises the gate behavior.
| // A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do). | |
| subtreeHash := createTestHash("liveness-gate-subtree") | |
| block := &model.Block{ | |
| // A minimal block that has one subtree (so fetchSubtreeDataForBlock has work to do). | |
| // Include a non-nil Header because blockWorker calls block.Hash(), which | |
| // dereferences b.Header.Hash(). | |
| subtreeHash := createTestHash("liveness-gate-subtree") | |
| block := &model.Block{ | |
| Header: &model.BlockHeader{}, |
There was a problem hiding this comment.
Great catch — confirmed the panic locally (SIGSEGV inside BlockHeader.Bytes -> HashPrevBlock.CloneBytes). Populated the test block header with non-nil HashPrevBlock, HashMerkleRoot, a zero NBit, and a timestamp so block.Hash() serializes cleanly. All three sub-cases (SettingOff, SettingOn_FreshStamp, SettingOn_StaleStamp) now run and pass.
|
|
||
| // prometheusSubtreeOnlyTxMiss counts per-tx fetch fallbacks during the subtree-only | ||
| // path. A spike on a node configured with AssumeTxsBroadcastToAllNodes indicates | ||
| // upstream trouble (blaster health, peer connectivity, or misconfiguration). | ||
| prometheusSubtreeOnlyTxMiss prometheus.Counter |
There was a problem hiding this comment.
prometheusSubtreeOnlyTxMiss is declared and registered, but there are no increments/references anywhere in the codebase. As-is, the metric will stay at 0 and can mislead operators/dashboards. Either wire it into the per-tx fetch fallback path for the subtree-only flow (where the miss is detected) or remove it until it is actually emitted.
| // prometheusSubtreeOnlyTxMiss counts per-tx fetch fallbacks during the subtree-only | |
| // path. A spike on a node configured with AssumeTxsBroadcastToAllNodes indicates | |
| // upstream trouble (blaster health, peer connectivity, or misconfiguration). | |
| prometheusSubtreeOnlyTxMiss prometheus.Counter |
There was a problem hiding this comment.
Good point — the metric was forward-looking (the PR description mentions it, but the wiring into the per-tx fetch path in the validator didn't land with this PR). Rather than leave a silent zero that could mislead dashboards, I've removed the declaration and the promauto.NewCounter registration. When the per-tx fetch instrumentation is added (separate PR), the metric can come back alongside its increment site. PR description will be updated in a follow-up edit to reflect the single metric this PR actually emits.
- Patch protoc header in blockchain_api.pb.go / _grpc.pb.go from v6.33.4 to v7.34.1 to match the rest of the repo. Stub content is otherwise identical; only the header-comment versions change. - get_blocks_test.go: TestBlockWorker_LivenessGate was panicking at runtime because block.Hash() → BlockHeader.Bytes() dereferences HashPrevBlock/HashMerkleRoot/Bits. Populate a minimal non-nil Header so all three sub-cases (SettingOff, SettingOn_FreshStamp, SettingOn_StaleStamp) run and pass. - metrics.go: remove the unused prometheusSubtreeOnlyTxMiss counter. It was declared and registered but never incremented anywhere, which would silently show zero on dashboards and mislead operators. Can be added back when the per-tx fetch wiring lands in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Stamping errors are informational — do not fail validation on a stamp failure. | ||
| if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil { | ||
| u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr) |
There was a problem hiding this comment.
BlockFound now calls ReportPeerBlockHeaderSeen unconditionally. This introduces an extra blockchain-service RPC on the hot path even when SubtreeValidation.AssumeTxsBroadcastToAllNodes is false (the default), which contradicts the PR claim that the default is a no-op and also increases runtime overhead for mainnet/testnet. Consider gating this call behind the setting (and possibly also requiring LivenessWindow > 0) so the additional RPC + in-memory stamp writes only occur when the optimization is enabled.
| // Stamping errors are informational — do not fail validation on a stamp failure. | |
| if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil { | |
| u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr) | |
| // Only do this when subtree liveness optimisation is enabled; otherwise this | |
| // would add an unnecessary blockchain-service RPC on the BlockFound hot path. | |
| // Stamping errors are informational — do not fail validation on a stamp failure. | |
| if u.settings.SubtreeValidation.AssumeTxsBroadcastToAllNodes && u.settings.SubtreeValidation.LivenessWindow > 0 { | |
| if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil { | |
| u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr) | |
| } |
There was a problem hiding this comment.
Good catch — gated the stamp call behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0 (matching your suggestion). Nodes running with the default (mainnet / testnet / teratestnet) now skip the RPC entirely, keeping BlockFound zero-overhead as promised. Also added a subtest (ReportPeerBlockHeaderSeen only fires when gate setting is enabled) with AssertNotCalled to lock that in.
| prometheusLivenessGateDecision = promauto.NewCounterVec( | ||
| prometheus.CounterOpts{ | ||
| Namespace: "teranode", | ||
| Subsystem: "subtreevalidation", | ||
| Name: "livenessgate_decision_total", | ||
| Help: "Count of liveness-gate decisions by outcome (subtreeonly, subtreedata, notfound, err).", | ||
| }, | ||
| []string{"decision"}, | ||
| ) |
There was a problem hiding this comment.
The PR description mentions a second new Prometheus counter (teranode_subtreevalidation_subtreeonly_tx_miss_total), but the code in this PR only registers teranode_subtreevalidation_livenessgate_decision_total. Please update the PR description (or add the missing metric/increment site) so operators don’t look for a metric that won’t exist after merge.
There was a problem hiding this comment.
Done — updated the PR description to mention only the single livenessgate_decision_total counter this PR actually emits. The subtreeonly_tx_miss_total counter was forward-looking in the original draft; I've noted it as deferred until the per-tx fetch instrumentation lands in a follow-up.
Block validation's sequential revalidation step (end of CheckBlockSubtrees)
did a single pass through the list of subtrees and returned on the first
validation failure. This breaks for blocks with many cross-subtree
transaction dependencies: if subtree B spends an output created by a tx
in subtree A, and B is attempted before A finishes populating the
internal tx cache, B fails with TX_MISSING_PARENT even though the parent
becomes available as soon as A's revalidation completes.
Symptom observed on teratestnet: block 15631 (9,216 txs, 1,305 with
cross-subtree parents) consistently failed with:
[processTransactionsInLevels] Completed processing with 1305 errors,
1305 transactions added to orphanage
The orphanage kept the children but block validation is all-or-nothing —
they could never be retried, so teratestnet stalled indefinitely at 15,630.
Fix: replace the single-pass revalidation with a convergence loop that
retries still-failing subtrees until either every subtree succeeds or
an iteration makes no progress (meaning the failures are real, not
ordering artefacts). Maximum iterations is bounded by the number of
subtrees since each iteration must make at least one unit of progress
to continue.
This is the "Phase 2 convergence loop" from PR bsv-blockchain#646, which I extracted
into this standalone PR as recommended in the spec for PR bsv-blockchain#715.
Testing:
- make lint: 0 issues
- go test -tags testtxmetacache ./services/subtreevalidation/... -run TestCheckBlockSubtrees: 22 passed
- Integration test: deploy to teratestnet and verify sync past 15631
🤖 Generated with [Claude Code](https://claude.com/claude-code)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // The first stamp for a given hash wins; repeated inserts are no-ops. This matches | ||
| // the semantic "when did we first learn about this header?" |
There was a problem hiding this comment.
The “first stamp wins; repeated inserts are no-ops” description is only true while the entry is still within the TTL. After TTL expiry the map reports the key as not found and a subsequent stamp will overwrite. Consider clarifying this in the comment to avoid implying global write-once semantics across the process lifetime.
There was a problem hiding this comment.
Right — updated both the struct comment and the stamp method's doc to make the TTL boundary explicit: "within a single TTL window the first stamp wins; once the TTL elapses the entry has been evicted and a new stamp records the new observation rather than reviving a stale one." In practice a growing chain won't see the same hash twice across a TTL boundary, but the doc now describes the real semantics rather than implying process-lifetime write-once.
| // NewLocalClient creates a new LocalClient instance with the provided dependencies. | ||
| func NewLocalClient(logger ulogger.Logger, tSettings *settings.Settings, store blockchain.Store, subtreeStore blob.Store, utxoStore utxo.Store) (ClientI, error) { | ||
| // Settings may be nil in some test contexts; fall back to the default zero | ||
| // LivenessWindow so receivedAtTTL clamps to its 30m floor. | ||
| var livenessWindow time.Duration | ||
| if tSettings != nil { | ||
| livenessWindow = tSettings.SubtreeValidation.LivenessWindow | ||
| } | ||
|
|
||
| return &LocalClient{ | ||
| logger: logger, | ||
| settings: tSettings, | ||
| store: store, | ||
| subtreeStore: subtreeStore, | ||
| utxoStore: utxoStore, | ||
| receivedAt: newReceivedAtStore(receivedAtTTL(livenessWindow)), | ||
| subscribers: make(map[string]chan *blockchain_api.Notification), | ||
| }, nil |
There was a problem hiding this comment.
LocalClient now allocates a receivedAtStore via expiringmap.New(...), which starts a background ticker goroutine for any non-zero TTL. LocalClient doesn’t expose any Stop/Close to release that goroutine, so short-lived LocalClients (common in tests/benchmarks) will leak goroutines for the lifetime of the process. Consider adding a Stop/Close method on LocalClient that calls c.receivedAt.Stop() (even if not part of ClientI), and using it in test cleanups or any code that constructs LocalClient transiently.
There was a problem hiding this comment.
Added (*LocalClient).Stop() (nil-safe, idempotent) that forwards to receivedAt.Stop() so transient LocalClients in tests/benchmarks can defer client.Stop() without leaking the background cleanup goroutine. Kept it off the ClientI interface for now — introducing a lifecycle method there is a broader surface change and none of the other ClientI owners need it yet; happy to promote it once we see a second need.
| blockchainClient: mockBlockchainClient, | ||
| } | ||
| defer bv.blockExistsCache.Stop() | ||
| require.NoError(t, bv.SetBlockExists(&hash)) // short-circuits after stamp check |
There was a problem hiding this comment.
The comment on bv.SetBlockExists says this “short-circuits after stamp check”, but BlockFound returns immediately on exists==true (before ReportPeerBlockHeaderSeen is reached). Please update the comment to match the actual control flow so the test intent stays clear.
| require.NoError(t, bv.SetBlockExists(&hash)) // short-circuits after stamp check | |
| require.NoError(t, bv.SetBlockExists(&hash)) // causes BlockFound to return early on the exists path, before ReportPeerBlockHeaderSeen |
There was a problem hiding this comment.
Good catch — updated the comment. Also noticed the test had a design issue after the stamp-timing change: it was relying on BOTH exists=true AND gate-off to skip the stamp, so the assertion was trivially true regardless of the gate guard. Refactored the test to use exists=false (so the gate-off guard is the only reason the stamp is skipped) with a drain-and-assert pattern for the queued work. The assertion now actually tests what the name says.
- received_at.go: clarify TTL-bounded write-once semantics. Update both the struct doc and the stamp method doc to explicitly describe what happens across a TTL boundary (previous entry evicted; new stamp records new observation rather than reviving a stale one) instead of implying process-lifetime write-once. - LocalClient.go: add (*LocalClient).Stop() (nil-safe, idempotent) forwarding to receivedAt.Stop(). Transient LocalClients in tests and benchmarks can now defer client.Stop() to release the background cleanup goroutine owned by the receivedAt ExpiringMap. - Server_test.go: fix the "ReportPeerBlockHeaderSeen only fires when gate is enabled" subtest. Previously it set exists=true, so the exists-short-circuit skipped the stamp regardless of the gate guard (assertion was trivially true). Refactored to use exists=false with a drain-the-queue pattern so the gate-off guard is the only reason the stamp is skipped — the test now actually validates its stated behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| time.Sleep(10 * time.Millisecond) | ||
| require.Equal(t, 1, len(server.blockFoundCh)) | ||
| <-server.blockFoundCh |
There was a problem hiding this comment.
Using a fixed 10ms sleep + len(server.blockFoundCh) to wait for the goroutine in BlockFound to enqueue work is timing-dependent and can be flaky on loaded/slow CI. Prefer synchronizing deterministically (e.g., select on server.blockFoundCh with a timeout, or require.Eventually until an item is received) instead of sleeping and checking len().
| time.Sleep(10 * time.Millisecond) | |
| require.Equal(t, 1, len(server.blockFoundCh)) | |
| <-server.blockFoundCh | |
| select { | |
| case <-server.blockFoundCh: | |
| case <-time.After(time.Second): | |
| t.Fatal("timed out waiting for blockFound work to be queued") | |
| } |
There was a problem hiding this comment.
Agreed — replaced the sleep-and-check-len pattern with select { case <-server.blockFoundCh: ... case <-time.After(time.Second): t.Fatal(...) } as suggested. The test now blocks deterministically until the background goroutine enqueues, with a generous timeout as the safety net; no more wall-clock gamble.
- Server_test.go: replace the sleep-and-check-len synchronization in
the "gate off → no stamp" subtest with a deterministic
select { case <-blockFoundCh: ... case <-time.After(1s): t.Fatal }
pattern as Copilot suggested. No more wall-clock gamble on loaded
CI runners.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| srv := &Server{ | ||
| logger: ulogger.TestLogger{}, | ||
| settings: tSettings, | ||
| blockchainClient: mockClient, | ||
| } |
There was a problem hiding this comment.
This test constructs a Server directly without calling InitPrometheusMetrics() / srv.Init(...). Since Server.ShouldUseSubtreeOnlyPath increments prometheusLivenessGateDecision, running this test in isolation can panic on a nil CounterVec (it currently relies on metrics having been initialized by some other test’s side effects). Initialize the metrics explicitly in this test (or call InitPrometheusMetrics() inside ShouldUseSubtreeOnlyPath with a sync.Once) so the test is order-independent.
There was a problem hiding this comment.
Great catch — confirmed this test passes today only because SubtreeValidation_test.go runs the package-level sync.Once first and leaves the counters registered as a side effect. Added an explicit InitPrometheusMetrics() call at the top of TestShouldUseSubtreeOnlyPath_TruthTable (it's idempotent via sync.Once). The test now works in isolation (go test -run TestShouldUseSubtreeOnlyPath_TruthTable) and is order-independent.
- livenessgate_test.go: call InitPrometheusMetrics() at the top of TestShouldUseSubtreeOnlyPath_TruthTable so it works in isolation. Previously the test was relying on another test's side effect (SubtreeValidation_test.go triggering the package-level sync.Once first). InitPrometheusMetrics is idempotent, so the explicit call is safe regardless of test order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // PHASE 2: Exact pre-allocation | ||
| subtreeTxs[subtreeIdx] = make([]*bt.Tx, 0, subtreeToCheck.Length()) | ||
|
|
||
| if subtreeOnly { | ||
| // Gate is on and the block's header was received recently — skip | ||
| // the subtreeData download entirely. subtreeTxs[subtreeIdx] stays | ||
| // empty, so the block-level batch processTransactionsInLevels call | ||
| // downstream is skipped (batchTxCount == 0). Transaction resolution | ||
| // happens later in validation via the subtree manifest + | ||
| // UTXO/TxMetaCache, with per-tx fetch as the fallback for real | ||
| // misses. | ||
| return nil |
There was a problem hiding this comment.
In the subtree-only gated path, this pre-allocation still allocates an underlying array with cap=subtreeToCheck.Length() even though the worker immediately returns. For large subtrees this can allocate a lot of memory per missing subtree and partially defeats the purpose of skipping subtreeData downloads. Consider checking subtreeOnly before this make(...), or allocating a nil/zero-cap slice in the gated path and only doing the exact pre-allocation when you’re going to populate it.
There was a problem hiding this comment.
Right — moved the subtreeOnly early-return ahead of the make([]*bt.Tx, 0, subtreeToCheck.Length()) pre-allocation. On the gated path subtreeTxs[subtreeIdx] now stays as the zero-value nil slice, saving subtreeToCheck.Length() capacity worth of memory per missing subtree. The comment downstream still holds (nil slice → batchTxCount == 0 → processTransactionsInLevels skipped).
| AppCtx: ctx, | ||
| blocksFinalKafkaAsyncProducer: blocksFinalKafkaAsyncProducer, | ||
| batchTokens: make(map[string]*blobDeletionBatchToken), | ||
| receivedAt: newReceivedAtStore(receivedAtTTL(livenessWindow)), | ||
| } |
There was a problem hiding this comment.
receivedAt is initialized unconditionally. With the default config (AssumeTxsBroadcastToAllNodes=false) the service will still allocate/maintain this map and start a background cleanup ticker, even though the gate is never consulted. Consider only creating the receivedAtStore when the liveness gate feature is enabled (and/or LivenessWindow > 0), to keep the default-off deployment closer to a true no-op and avoid unnecessary memory growth during fast catchup.
There was a problem hiding this comment.
Done — the receivedAt store is now only allocated when the gate is enabled at construction (AssumeTxsBroadcastToAllNodes && LivenessWindow > 0), in both Server.New and NewLocalClient. Made receivedAtStore.stamp and .lookup nil-safe so call sites don't need extra guards beyond the existing gate-setting checks they already perform. Default-off deployments now allocate zero bytes and start zero goroutines for this feature — true no-op.
| // Record first-seen time for the subtree-only liveness gate as a | ||
| // belt-and-braces fallback. The primary stamp happens earlier in | ||
| // blockvalidation.BlockFound via ReportPeerBlockHeaderSeen, but this write | ||
| // covers paths that reach AddBlock without having gone through BlockFound | ||
| // (e.g. locally mined blocks). Stamping is write-once, so double writes are | ||
| // cheap no-ops. | ||
| b.receivedAt.stamp(block.Hash()) |
There was a problem hiding this comment.
AddBlock stamps receivedAt for every stored block, regardless of whether the liveness gate is enabled. In a fast catchup this can grow the in-memory map (TTL floor is 30m) on nodes that aren’t using the optimization. Consider gating this stamp behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0, or making the store/stamp conditional (while still stamping locally-mined blocks when the feature is enabled).
| // Record first-seen time for the subtree-only liveness gate as a | |
| // belt-and-braces fallback. The primary stamp happens earlier in | |
| // blockvalidation.BlockFound via ReportPeerBlockHeaderSeen, but this write | |
| // covers paths that reach AddBlock without having gone through BlockFound | |
| // (e.g. locally mined blocks). Stamping is write-once, so double writes are | |
| // cheap no-ops. | |
| b.receivedAt.stamp(block.Hash()) | |
| // The primary first-seen stamp for the subtree-only liveness gate happens | |
| // earlier in blockvalidation.BlockFound via ReportPeerBlockHeaderSeen. | |
| // Avoid stamping every stored block here, as this path is exercised during | |
| // fast catchup and can otherwise grow the in-memory receivedAt map even when | |
| // the liveness optimization is not in use. |
There was a problem hiding this comment.
Agreed — gated the AddBlock stamp behind AssumeTxsBroadcastToAllNodes && LivenessWindow > 0 in both Server.AddBlock and LocalClient.AddBlock. Combined with the conditional store allocation (other thread), nodes that don't use the optimisation now allocate zero receivedAt entries during fast catchup. The stamp still fires for locally mined blocks on nodes where the gate is enabled (they reach AddBlock without going through BlockFound), which is the original belt-and-braces intent.
Make default-off deployments a true zero-allocation no-op for the liveness-gate feature: - check_block_subtrees.go: move the subtreeOnly early-return ahead of the subtreeTxs[i] = make(..., subtreeToCheck.Length()) pre-allocation so the gated path doesn't allocate a per-subtree tx slice. The remaining nil-slice still makes the block-level batch processTransactionsInLevels skip (batchTxCount == 0). - received_at.go: nil-safe stamp() and lookup() so call sites can leave the store unallocated without extra guards. - Server.go / LocalClient.go (New + AddBlock): allocate the receivedAt store only when the gate is enabled at construction, and gate the AddBlock belt-and-braces stamp behind the same setting. Fast catchup on default-off nodes no longer grows an in-memory map. - server_test.go (TestGetHeaderReceivedAt_StampedOnAddBlock): enable the gate setting + allocate the store for this test since it was relying on the always-on default behaviour we've now removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // DecisionSubtreeData — gate disabled by setting, or stamp is stale. | ||
| // Caller must fall back to the subtreeData path. |
There was a problem hiding this comment.
Doc mismatch: the comment for DecisionSubtreeData says it covers “gate disabled by setting, or stamp is stale”, but Decide() also returns DecisionSubtreeData when window <= 0. Please update the DecisionSubtreeData comment to include the non-positive window short-circuit (so future readers don’t assume these cases are distinguished elsewhere).
| // DecisionSubtreeData — gate disabled by setting, or stamp is stale. | |
| // Caller must fall back to the subtreeData path. | |
| // DecisionSubtreeData — gate disabled by setting, liveness window is | |
| // non-positive, or stamp is stale. Caller must fall back to the | |
| // subtreeData path. |
| // Once-per-block liveness gate. If true, every per-subtree worker in this | ||
| // invocation skips the subtreeData download. In that path the per-subtree | ||
| // worker never populates subtreeTxs[subtreeIdx], so the block-level batch | ||
| // processTransactionsInLevels call is skipped entirely (batchTxCount == 0). | ||
| // Transaction resolution happens downstream in validation via the subtree | ||
| // manifest + UTXO/TxMetaCache, with per-tx fetch as the fallback for real | ||
| // misses (validator resolves each referenced tx hash on demand). | ||
| subtreeOnly := u.ShouldUseSubtreeOnlyPath(ctx, block.Hash()) | ||
|
|
There was a problem hiding this comment.
The liveness gate is evaluated before the tracing span is started. That means the GetHeaderReceivedAt RPC (and any debug logging/metrics side-effects) won’t be associated with the CheckBlockSubtrees span, which makes it harder to debug gate behavior from traces. Consider starting the tracing span first, then computing subtreeOnly using the traced ctx.
| if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil { | ||
| u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr) |
There was a problem hiding this comment.
Potential nil deref: this calls u.blockchainClient.ReportPeerBlockHeaderSeen(...) without checking u.blockchainClient != nil. blockvalidation.New takes blockchainClient as a parameter but doesn’t guard against nil, and some tests previously built Server without setting this field. Please add a nil check (or call through the already-required client used by u.blockValidation) so enabling AssumeTxsBroadcastToAllNodes can’t crash the service if the client wasn’t wired.
| if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil { | |
| u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr) | |
| if u.blockchainClient != nil { | |
| if stampErr := u.blockchainClient.ReportPeerBlockHeaderSeen(ctx, hash); stampErr != nil { | |
| u.logger.Debugf("[BlockFound][%s] ReportPeerBlockHeaderSeen failed, liveness gate may miss for this header: %v", hash.String(), stampErr) | |
| } | |
| } else { | |
| u.logger.Debugf("[BlockFound][%s] skipping ReportPeerBlockHeaderSeen because blockchain client is not configured", hash.String()) |
…rapper
The liveness gate is a feature-specific decision helper, not a general
teranode utility — both callers (subtreevalidation per-subtree gating
and blockvalidation catchup prefetch) consume the same one feature,
driven by SubtreeValidation.AssumeTxsBroadcastToAllNodes.
- Move util/livenessgate/ → services/subtreevalidation/livenessgate/
so the package lives alongside the setting that owns it.
- Drop the package-level ShouldUseSubtreeOnlyPath convenience. Callers
use Decide directly and compare against DecisionSubtreeOnly (exactly
one extra line at the one site that used the wrapper).
- Drop the three `var _ livenessgate.Client = (*T)(nil)` assertions in
services/blockchain/. They created a reverse-layering import
(blockchain depending on a subtreevalidation subpackage) and were
redundant — real call sites already pass the blockchain client to
livenessgate.Decide, so the compiler checks the interface match at
the call site.
Behaviour unchanged. The Decision enum + metric labels are kept; the
(*Server).ShouldUseSubtreeOnlyPath wrapper in subtreevalidation is
kept (it bundles metric emission and logging over Decide).
Net: -37 lines, one fewer package, no cross-service dependency inversion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
Superseded by #745 — self-discovering adaptive gate replaces the operator-configured feature flag + wall-clock liveness gate. The new design deliberately avoids any wall-clock or FSM-state gating (enforced by a source-scanning regression test), eliminating the class of bug that caused PR #598 to be reverted via #647. |


Summary
Shared decision helper
The `services/subtreevalidation/livenessgate` package exposes a pure `Decide(ctx, client, hash, enabled, window) → (Decision, error)` function consumed by both call sites. The `Decision` enum (`DecisionSubtreeOnly / DecisionSubtreeData / DecisionNotFound / DecisionError`) drives both the boolean check at the call site and the Prometheus metric label. The `(*Server).ShouldUseSubtreeOnlyPath` method in subtreevalidation wraps `Decide` with metric emission and debug logging.
The package sits under subtreevalidation because it is a feature-specific decision helper for the subtree-only path, not a general-purpose utility.
Design
Key invariant: the gate reads only the wall-clock age of the block header's `ReceivedAt` stamp. It does not consult FSM state — that's the specific behaviour that caused PR #598 to cascade under load and forced its revert via PR #647. A dedicated regression test (`TestCheckBlockSubtrees_FSMCascadeDoesNotPoisonGate`) locks this in.
Test plan
Rollout
Default `false` means merging is a no-op for mainnet / testnet / teratestnet. Enable via `settings_local.conf` on dev-scale only.
🤖 Generated with Claude Code