perf(blockvalidation): prefetch the off-chain block-id set once in checkOldBlockIDs (one round-trip per block on any client topology)#934
Conversation
…en in-memory chain check is enabled
The toggle blockchain_use_in_memory_chain_check (introduced for
CheckBlockIsInCurrentChain) made membership tests against the main
chain O(1) by checking a small in-memory off-chain set of known
non-canonical block IDs. But the toggle had almost no observable
effect on production because CheckBlockIsInCurrentChain is rarely
called from production code — only setTxMined and the slow-path
fallback inside checkOldBlockIDs.
The actual hot path of chain-membership checking lives inside
checkOldBlockIDs itself:
currentChainBlockIDs := GetBlockHeaderIDs(prev, 10_000) // 10k row fetch
build a 10k-entry map for fast lookup
iterate per tx, hit the map, fall back to CheckBlockIsInCurrentChain
on miss.
This was a sensible optimisation when CheckBlockIsInCurrentChain was
a recursive CTE on every call. With the off-chain-set toggle on,
every per-tx call is already O(1) — the upfront 10k fetch + map
build is now pure overhead.
Live profile on betfair-pc mainnet (postgres backend, in-memory
toggle on): checkOldBlockIDs's upfront fetch held 35% of inuse heap
(464 MB), drove 16.5% of CPU, and contributed 3.2 GB of allocs per
30 s — the single largest per-block allocation site once the GC
fixes had landed.
This commit:
- Adds checkOldBlockIDsInMemory which goes straight to
CheckBlockIsInCurrentChain per-tx, with the same string-keyed
dedupe cache the original uses for identical blockID slices.
- Gates it behind blockchain_use_in_memory_chain_check so the
behaviour is opt-in and easy to A/B against the original.
- Leaves the original implementation untouched for the toggle=false
case, so existing deployments that haven't enabled the toggle
see no behavioural change at all.
Tests: 35 blockvalidation tests pass.
…checkOldBlockIDs comments Follow-up to 25d29b8. The original commit's prose framed the toggle as 'use off-chain set instead of main-chain set' without making explicit that both routes answer the SAME question — 'is at least one of this tx's parent block IDs on the current main chain?' — just via opposite data structures: * main-chain set: positive membership in a 10k-deep freshly-fetched set * off-chain (forked) set: negative membership in the small persistent in-memory set of block IDs known to be OFF the main chain By complement the two answers are identical. Reworded the function- level doc and the route-selector comment to lead with that, kept the existing 'off-chain' terminology (matches the underlying store's offChainBlockIDs field and the existing setting key) and added '(forked)' parenthetical for clarity. Renamed the per-call log lines to 'forked-set route' for the same reason. Also folded in the measured before/after numbers from the live A/B on betfair-pc mainnet so the rationale stays attached to the code: RSS -29%, 30s alloc volume -5×, total CPU -22%, sql.GetBlockHeaderIDs out of the inuse heap entirely. No behavioural change.
|
🤖 Claude Code Review Status: Complete This PR introduces a performance optimization for Review Summary: The implementation is solid and has been thoroughly reviewed through multiple rounds by oskarszoon. All critical consensus safety issues from earlier iterations have been addressed: ✅ Consensus Safety: The Architecture: The off-chain prefetch approach is clever: instead of fetching 10k main-chain IDs repeatedly, it fetches the small bounded set of forked block IDs once per block. On a healthy chain this set is near-empty, making the prefetch cheap while enabling local resolution of membership queries. The measured impact on betfair-pc mainnet (22% CPU reduction, 29% heap reduction) validates the approach. No blocking issues found. The toggle remains opt-in ( |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-06-03 09:27 UTC |
…reframe doc/log as 'prefetch' vs 'no-prefetch'
The previous comments and function name were framed around 'main-chain
set vs off-chain (forked) set', which described what the underlying
blockchain store does, not what this file does. The blockvalidation
file itself never touches the off-chain set — it just calls
CheckBlockIsInCurrentChain, which consults the in-memory forked-block
cache inside the blockchain store when blockchain_use_in_memory_chain_check
is enabled. Reviewers reading only this file were left looking for an
off-chain lookup that isn't here.
Reframe locally as:
Prefetch route (default): GetBlockHeaderIDs(prev, 10_000) + local
map + per-tx fallback RPC.
No-prefetch route (toggle on): just call the per-tx RPC. The RPC
itself is now an O(1) in-memory check inside the blockchain store
(off-chain cache, owned and refreshed there).
Changes:
- Rename the new function checkOldBlockIDsInMemory ->
checkOldBlockIDsWithoutPrefetch.
- Replace the route-selector comment with a side-by-side strategy
description that doesn't depend on the reader knowing about the
store's internals.
- Update the per-route log strings from off-chain-set/forked-set
-> prefetch route / no-prefetch route, so the running log matches
the code's vocabulary.
- No behavioural change.
…OldBlockIDsWithCachedLookup helper, add no-prefetch route tests Addresses code-review follow-ups on PR bsv-blockchain#934: 1. Restored the explanatory comment about why the prefetch route uses HashPrevBlock to anchor GetBlockHeaderIDs (normal vs optimistic-mining paths). The comment had been dropped during an earlier reshuffle; the rationale still applies to the prefetch path that uses it. 2. Extracted iterateOldBlockIDsWithCachedLookup as a shared helper used by both checkOldBlockIDs (prefetch route) and checkOldBlockIDsWithoutPrefetch (no-prefetch route). Pulls the per-tx empty-blockIDs check, the sorted-blockIDs dedupe cache, and the lookup+error-reporting loop into one place. Each route just wraps it: prefetch supplies a fastPath callback that short-circuits against the freshly-loaded main-chain map; no-prefetch passes nil for fastPath. Behaviour is unchanged in both routes. 3. Added Test_checkOldBlockIDs_noPrefetchRoute mirroring the existing prefetch-route subtests: - dispatches to no-prefetch (asserts GetBlockHeaderIDs NOT called when toggle is on) - empty map (no-op, no RPC) - empty parents (ProcessingError) - all parents on chain (NoError) - dedupe cache: identical blockIDs slices share one RPC - not on chain returns BlockInvalidError 4. (Considered relaxing the HashPrevBlock precondition on the no-prefetch path. Not worth it — the tracing/log setup at the top of checkOldBlockIDs calls block.Hash().String() before the dispatch, which itself requires a valid header. Strict check stays for consistency with logging requirements; the failing test that motivated the relaxation was removed.) No behavioural change. The existing 7 Test_checkOldBlockIDs subtests continue to pass against the refactored prefetch route; 6 new subtests cover the no-prefetch route.
…pcCount to lookupCount, add RPC-error test Three small cleanups from a self-review pass on PR bsv-blockchain#934: 1. Log punctuation. The prefetch route's start log uses 'prefetch route:' (colon); the no-prefetch route was 'no-prefetch route,' (comma). Standardise both on colon. Also change the no-prefetch done log key 'rpc=%d' to 'lookup=%d' for consistency with the renamed helper counter (see below). The prefetch done log keeps its historical 'slowPath=%d' field name for grep continuity; a code comment notes the relationship. 2. Rename rpcCount → lookupCount on the helper's named return. The helper now uses a route-neutral name; each caller wraps it in whatever local name suits the per-route log it writes. Doc-comment on the helper explains the mapping. 3. Add a new no-prefetch-route subtest covering the RPC error path. The helper translates a lookup-callback error into a ProcessingError (so the caller retries) rather than a BlockInvalidError (which would invalidate the block). Mock returns errors.NewServiceError; the test asserts the resulting error mentions 'failed to check' and not 'are not from current chain'. Previously this branch had no coverage. No behavioural change. 14 tests pass (7 prefetch + 7 no-prefetch).
…-inmemory-checkold
There was a problem hiding this comment.
Pull request overview
This PR optimizes BlockValidation.checkOldBlockIDs by skipping the 10k main-chain block-ID prefetch when blockchain_use_in_memory_chain_check is enabled, relying instead on the (now O(1) in-memory) CheckBlockIsInCurrentChain RPC, while keeping the legacy prefetch behavior unchanged for the default setting.
Changes:
- Added a settings-gated “no-prefetch” route and preserved the existing “prefetch” route as the default.
- Refactored shared per-tx iteration logic into a helper (
iterateOldBlockIDsWithCachedLookup) while keeping the string-keyed dedupe cache in both routes. - Added focused unit tests to ensure the no-prefetch route dispatches correctly and matches prefetch-route outcomes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| services/blockvalidation/BlockValidation.go | Adds toggle-based dispatch to a new no-prefetch route, factors shared iteration into a helper, and updates logs/comments for route clarity. |
| services/blockvalidation/BlockValidation_test.go | Adds a new test suite mirroring the existing subtests to validate the no-prefetch route and ensure no GetBlockHeaderIDs calls occur. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Both routes below dereference block.Header and (transitively, via | ||
| // block.Hash() inside logging) require a usable header — guard once | ||
| // up-front so each route can assume a valid block. |
There was a problem hiding this comment.
Good catch on all three — fixed:
- Panic before guard / in guard: moved the
block.Header == nil || block.Header.HashPrevBlock == nilguard to the very top of the function, beforetracing.Tracer().Start. The guard's error message no longer callsblock.String()(which dereferencesb.Headerviab.Hash()), so the nil-header path can no longer panic. - Swapped tracing args: the debug message now passes
block.Hash().String(), oldBlockIDsMap.Length()in%s/%dorder.block.Hash()is safe there because it now runs only after the nil-header guard has passed.
The no-prefetch sibling's block.Hash().String() calls are reached only after this same guard, so they're safe too. Full ./services/blockvalidation/ suite (601 tests) passes.
Move the block.Header / HashPrevBlock nil guard above tracing.Start so we never dereference a nil header via block.Hash()/block.String(), and fix the swapped %s/%d args in the tracing debug message.
|
[Minor] Potential slice mutation bug at services/blockvalidation/BlockValidation.go:2280 The slices.Sort(blockIDs) call mutates the input slice in-place. This slice may share underlying storage with parentTxMeta.BlockIDs from model/Block.go:990 where oldBlockIDs = append(oldBlockIDs, parentTxMeta.BlockIDs...) is used. If the append reuses the backing array, sorting here will mutate the original metadata, potentially corrupting cached data or causing race conditions. Recommendation: Make a defensive copy before sorting using slices.Clone(blockIDs) to ensure the original blockIDs slice remains unmodified. |
| fastPath func(blockIDs []uint32) bool, | ||
| lookup func(ctx context.Context, blockIDs []uint32) (bool, error), | ||
| ) (iterationError error, fastPathCount, lookupCount, cacheHitCount int) { | ||
| cache := make(map[string]bool, oldBlockIDsMap.Length()) |
There was a problem hiding this comment.
Agreed — capped the size hint. The dedupe cache is keyed by the sorted parent-block-ID slice, so it only ever holds as many entries as there are distinct parent-block sets referenced by the block (a handful in practice), but sizing the hint to oldBlockIDsMap.Length() scaled it with tx count and could reserve a huge map for almost no occupancy. Now hinted at min(oldBlockIDsMap.Length(), oldBlockIDsCacheHintCap) with oldBlockIDsCacheHintCap = 10_000, matching the prefetch route's historical implicit cap (len(currentChainBlockIDs) <= 10k). The map still grows if more distinct sets actually appear.
The shared iterator sized the dedupe cache hint to oldBlockIDsMap.Length() (tx count, up to millions on large blocks), but the cache only holds distinct parent-block-ID sets. Cap the hint at oldBlockIDsCacheHintCap=10_000 to match the prefetch route's historical implicit cap and avoid large per-block map allocations.
oskarszoon
left a comment
There was a problem hiding this comment.
Reviewed the no-prefetch path. Consensus equivalence is clean — both routes use the same CheckBlockIsInCurrentChain as final authority, a fast-path map hit provably always agrees with the RPC, no accept/reject path changes, and dropping the prefetch narrows the reorg stale-accept window (no fetch-vs-RPC snapshot skew). Go side: build/vet/tests pass, nil-header guard fix is complete, RPC-error handling correctly avoids cache poisoning. The slices.Sort aliasing the bot flagged is a non-issue — oldBlockIDsMap is a per-validation throwaway consumed once, parentTxMeta.BlockIDs is copied fresh via append(nil, …), and the sort already lived there pre-PR.
Asks before merge:
UseInMemoryChainCheckis a footgun on gRPC-client deployments (P1). The no-prefetch path is only cheap with an in-process (LocalClient) blockchain store — crossover ≈ 3000 txs. With a gRPC client the crossover is ≈ 6 txs, so on any real block it's a 100–1000× regression vs the single prefetch. The flag name implies the store is in-memory mode, not that the client is local — an operator can enable it on a multi-process node and tank big-block validation. Either type-assert LocalClient before taking the no-prefetch path, or document loudly (flag help +checkOldBlockIDsWithoutPrefetchdoc comment) that it requires co-located blockchain access.- Add the nil-header regression test. The round-1 guard move has no test; a regression would re-introduce the panic silently.
- Commit the perf evidence. The 35%-heap / 22%-CPU mainnet pprof numbers are convincing but live only in the PR description — put them in a code comment or perf-notes, and ideally add a benchmark over
checkOldBlockIDsat realistic tx counts so nothing forces callers back to SQL unnoticed.
Nice-to-haves: a true cross-toggle equivalence test (same fixture through both routes), a multi-element blockIDs ANY-of case, and aligning the "slowPath"/"lookup" log label across routes.
…on any client topology The off-chain (in-memory) chain-check route called CheckBlockIsInCurrentChain once per distinct parent-block-ID set. That is an in-process function call under LocalClient but a network round-trip under the gRPC client, so on a multi-process node it fanned out into N round-trips per block where the on-chain prefetch route does one — a large regression on big blocks, and a footgun because the toggle name describes the store mode, not whether the client is co-located. Mirror the on-chain prefetch route with the set inverted: fetch the complete off-chain (forked) block-id set once via a new GetOffChainBlockIDs RPC, then resolve main-chain membership locally (a parent is on chain iff it is NOT in the off-chain set, ANY-of). The set is complete (not a recent window), so one prefetch resolves every tx with no "too old" miss — exactly one round-trip per block regardless of client topology. The off-chain set is small (bounded by fork activity), so the payload and local map stay tiny. When the store reports rebuilding (in-memory check disabled, or a reorg/startup rebuild in progress) the set is stale, so we fall back to the per-tx RPC path (its own authoritative SQL fallback). CheckBlockIsInCurrentChain remains the lookup authority on a fast-path miss, so a block is only rejected after the store confirms — consensus equivalence unchanged. Adds GetOffChainBlockIDs across proto/store/client/server/mocks. Tests cover the off-chain-prefetch happy path (local resolution, mixed sets, exclusively-off-chain rejection, RPC error), the rebuilding fallback, and a nil-header regression guard.
|
@oskarszoon thanks for the review — addressed your P1 head-on rather than gating it. Instead of skipping the prefetch and calling
Still owed (noted as unchecked in the test plan): a gRPC-topology bench on dev-scale before we default the toggle to |
|
@oskarszoon |
…tegration Merge stu/blockvalidation-inmemory-checkold (ca7ef31) — off-chain prefetch for checkOldBlockIDs plus the upstream/main it carried (bsv-blockchain#1005, bsv-blockchain#993, bsv-blockchain#1022). Conflict resolutions: - services/legacy/netsync/manager.go: kept integration's gap-watchdog self-heal (requestBlocksFromTip / maybeResyncOnMissingParent / missingParentResyncInterval); the incoming side had no content there. - stores/utxo/aerospike/batch_create.go: pass nil for the new GetBinsToStore arena param (main added *bt.Arena; integration-only BatchCreate caller predates it). nil = non-arena route, output-equivalent per create_arena_test.
oskarszoon
left a comment
There was a problem hiding this comment.
Re-reviewed after the rework. The pivot to a single OffChainBlockIDs prefetch resolves the v1 gRPC-deployment footgun — one round-trip per block on any topology, heap win preserved, and the nil-header regression test + multi-element ANY-of case are now covered. But the rework introduced a consensus divergence that blocks merge:
P1 — the off-chain fastPath drops the id > maxBlockID guard the authoritative path enforces (BlockValidation.go:2271-2279 vs CheckBlockIsInCurrentChain.go:42-60). The store skips candidate IDs above maxBlockID (its own comment, sql.go:95-98: "prevent non-existent/garbage block IDs from being incorrectly treated as on-chain"). The new local fastPath has no such guard and short-circuits before the RPC, so a tx whose parent block IDs are all > maxBlockID is accepted locally but rejected by the RPC — contradicting the store's own tested invariant (CheckBlockIsInCurrentChain_test.go:86,263,294). Reachable in normal operation, not just adversarially: GetNextBlockID advances the sequence and the id is written into tx-meta BlockIDs (quick_validate.go:1130-1147) before AddBlock bumps maxBlockID, with mainChainRebuilding==0 → rebuilding=false → the unguarded fastPath runs; and the StoreBlock CAS (StoreBlock.go:187) can lag on the normal path too. A toggled node accepting what an untoggled node rejects is a chain-split risk. Fix: carry max_block_id in the OffChainBlockIDs response (the current ([]uint32, bool, error) has no channel for it) and apply the same id > maxBlockID skip in the fastPath — or fall through to the RPC for above-max IDs.
P1 — add a sqlitememory-backed regression test for the all-ids-above-max case. The new Test_checkOldBlockIDs_* use blockchain.Mock, which the repo testing rules say not to do (use sqlitememory + LocalClient); a real-store test would have caught this divergence directly. No test currently covers a parent ID above maxBlockID through the toggle route.
P1 — rename checkOldBlockIDsWithoutPrefetch. It does prefetch (the off-chain set); within 110 lines it calls itself both "off-chain-prefetch route" and "no-prefetch route".
Non-blocking: OffChainBlockIDs.go holds RLock across the full set copy where the rest of the package uses the O(1) pointer-swap (OffChainBlockIDs.go:35-40); and it has no direct unit tests for its rebuilding/normal branches.
Consensus reorg/rebuild gating, API consistency across all layers, and the regenerated proto all check out. Fix the guard + add the real-store test and this is good to go.
…n prefetch fastPath Oli's re-review caught a consensus divergence in the off-chain-prefetch rework: the local fastPath dropped the `id > maxBlockID` guard that the authoritative CheckBlockIsInCurrentChain enforces. An ID can be allocated (GetNextBlockID) and written into a tx's BlockIDs before AddBlock bumps maxBlockID, with mainChainRebuilding==0; such an above-max ID is not in the off-chain set, so the unguarded fastPath short-circuited to "accepted" while the authoritative RPC rejects it — a toggled node accepting what an untoggled node rejects (chain split). Fix: OffChainBlockIDs now also returns maxBlockID (new proto field max_block_id, threaded through store/interface/LocalClient/gRPC Client/Server/mocks). The fastPath applies the same id > maxBlockID skip, so an above-max ID falls through to the authoritative RPC instead of being accepted locally. A block is rejected identically by both paths. Also: - Add Test_checkOldBlockIDs_offChainPrefetch_realStore — a sqlitememory + LocalClient differential test (no blockchain-store mock, per testing.md): for each candidate ID the off-chain route's accept/reject must equal CheckBlockIsInCurrentChain. Fails if the guard is missing. Plus a maxBlockID guard unit case and Test_OffChainBlockIDs_StoreBranches for the store method. - Rename checkOldBlockIDsWithoutPrefetch -> checkOldBlockIDsOffChainPrefetch (it prefetches the off-chain set); fix the stale "no-prefetch route" doc label. - OffChainBlockIDs reads the set via the package's copy-on-write pointer grab (RLock, release, then iterate) instead of holding RLock across the full copy.
|
@oskarszoon good catch on the consensus divergence — fixed in 6559dd9. P1 — maxBlockID guard (the blocker). P1 — real-store regression test. Added P1 — rename. Non-blocking — locking. Verified: |
oskarszoon
left a comment
There was a problem hiding this comment.
All three round-2 P1s are fixed and the consensus guard is verified:
- maxBlockID guard —
OffChainBlockIDsnow returnsmaxBlockID(newmax_block_idproto field, threaded through every layer), and the fastPath applies the sameid > maxBlockIDskip asCheckBlockIsInCurrentChain, so an above-max id falls through to the authoritative RPC instead of being short-circuited to accept. The two routes are now identical on this input. - Regression test —
Test_checkOldBlockIDs_offChainPrefetch_realStoreis a propersqlitememory+LocalClientdifferential (asserts the off-chain route's accept/reject equalsCheckBlockIsInCurrentChainfor ids at/above maxBlockID). I confirmed it does its job: with the guard removed it fails withconsensus divergence at id 1000: RPC on-chain=false but route accepted=true; with the guard it passes. The store-branch test and the rename are in too. - Lock —
OffChainBlockIDsnow uses the package's copy-on-write pointer-grab instead of holding RLock across the full copy.
Build/vet/race-tests green on my end. The red SonarQube gate isn't a bug: the new_reliability hit is the filename-convention rule flagging OffChainBlockIDs.go as non-snake_case — but the whole sql/ package is PascalCase, so the file matches its siblings; plus two go:S3776 cognitive-complexity smells (the guard pushed checkOldBlockIDsOffChainPrefetch to 17). Worth a glance, not blocking.
Approving.
…og labels Addresses the two remaining items from the first review: - Add BenchmarkCheckOldBlockIDs covering both the on-chain and off-chain prefetch routes at realistic tx counts (1k, 10k) with parents referencing a small set of recent on-chain blocks, so every tx resolves via the local fastPath (the production happy path). It is the guard against a silent regression to the per-tx CheckBlockIsInCurrentChain lookup: if a change stops the fastPath resolving locally, ns/op and allocs/op spike here instead of callers quietly falling back to the RPC/SQL path. Flat low allocs/op (~86-115) confirm no per-tx fan-out. - Align the per-tx lookup log label across routes: the on-chain route logged "slowPath=%d" while the off-chain route logged "lookup=%d" for the same counter. Both now use "lookup" so the logs are directly comparable; doc comments updated.
|
@oskarszoon closing out the last two items from your first review (8ed7786, test-only): Benchmark — added Flat, low allocs/op (~72–114, not scaling with tx count) confirm no per-tx fan-out — every tx resolves on the fastPath. The off-chain route is also a touch cheaper (no 10k positive-map build): ~190µs vs ~252µs and 100 vs 114 allocs at 10k txs. Log label — aligned the per-tx counter label: on-chain route logged No production-logic change (benchmark + log string only). Sorry to bounce your approval on the SHA — re-requesting. PTAL. |
|
ordishs
left a comment
There was a problem hiding this comment.
Approve.
Consensus-equivalence with CheckBlockIsInCurrentChain is the right invariant and it is both reasoned and tested. Verified the central claims:
- The new
OffChainBlockIDsstore method mirrors the authoritative in-memory branch exactly — same gating (!useInMemoryChainCheck || mainChainRebuilding > 0), same maxID-before-set read ordering, same copy-on-write snapshot. - The
id > maxBlockIDguard in the fastPath is present and is the linchpin against a toggled/untoggled chain split (allocated-but-uncommitted IDs fall through to the authoritative RPC). Covered by both a mock test and a real sqlitememory differential test. - Single-snapshot reorg model matches the existing on-chain route;
rebuilding=trueroutes to the per-tx RPC fallback; rejects only ever happen after the authoritative RPC. - nil-header guard moved up-front (round-1 panic fix) with a regression test.
Local checks: go build of the three affected packages clean; go test -tags testtxmetacache for the checkOldBlockIDs/OffChainBlockIDs suites passes.
Non-blocking follow-ups:
- The
protoc-gen-go-grpc v1.6.1 → v1.6.2bump in generated code is incidental to local toolchain — worth confirming the repo's pinned codegen version to avoid churn. GetOffChainBlockIDsresponse is unbounded over the wire (vs the 10k window) — fine in practice (near-empty on a healthy chain); a doc note on expected size would help.Test_checkOldBlockIDs_noPrefetchRouteactually exercises the off-chain route's rebuilding fallback; the name reads slightly off.
Default path is unchanged; leaving the toggle default off and deferring the gRPC-topology bench is the correct sequencing.
oskarszoon
left a comment
There was a problem hiding this comment.
Re-approving on 8ed7786c. Confirmed the only change since the prior approval is test + log-string: the BlockValidation.go diff is purely the slowPath→lookup label rename (same lookupCount value, comments updated), and the maxBlockID guard is untouched (still at the fastPath, line 2287). The consensus guard + its differential regression test are intact and green on my end.
BenchmarkCheckOldBlockIDs reproduces here — flat allocs/op (86/72 at 1k, 115/100 at 10k, not scaling with tx count) confirms every tx resolves on the fastPath with no per-tx fan-out, which is exactly the silent-regression guard. Off-chain route a touch cheaper (no 10k positive-map build). build/vet/tests/bench all pass.
The red SonarQube gate isn't a bug: new_reliability=C is the filename-convention rule flagging OffChainBlockIDs.go as non-snake_case (the whole stores/blockchain/sql/ package is PascalCase, so it matches its siblings) plus two go:S3776 cognitive-complexity smells. Not blocking.
Approving.




Summary
checkOldBlockIDsruns during every block validation and answers, per tx, "is at least one of this tx's parent block IDs still on the current main chain?".The default route prefetches the 10,000 most-recent on-chain block IDs via
GetBlockHeaderIDs, builds a local map, and answers each tx against it. That prefetch was the single largest CPU and memory cost on the box during catch-up (≈16% CPU, ≈35% of inuse heap on betfair-pc mainnet).This PR adds a mirror route, gated on the existing
blockchain_use_in_memory_chain_checktoggle, that prefetches the off-chain set instead — and resolves membership with the set inverted. It costs exactly one round-trip per block on any client topology.The two strategies
Both produce identical pass/fail outcomes for the same input.
On-chain prefetch (default, unchanged):
GetBlockHeaderIDs(prev, 10_000)— fetch the 10,000 most-recent main-chain block IDs.map[uint32]struct{}; a parent ID present ⇒ on chain.CheckBlockIsInCurrentChainRPC.Off-chain prefetch (this PR, when
blockchain_use_in_memory_chain_check=true):GetOffChainBlockIDs()— fetch the complete off-chain (forked) block-id set once.map[uint32]struct{}; a parent ID absent ⇒ on chain (ANY-of).Why this shape
The main-chain set is unbounded and ever-growing, so the on-chain route must either materialise a huge positive set or window it to 10k (incomplete, re-fetched every block). The off-chain set is the small negative set the blockchain store already maintains in memory — bounded by fork activity, near-empty on a healthy chain. Prefetching that set once and inverting the test gives the same answer for a tiny payload and a tiny local map.
Crucially, it is one round-trip per block regardless of client topology:
LocalClient), the prefetch is a cheap function call.This is the key change from the earlier revision of this PR, which skipped the prefetch entirely and called
CheckBlockIsInCurrentChainper distinct parent-id set. That was free in-process but fanned out into N network round-trips per block under a gRPC client (the toggle name describes the store mode, not whether the client is co-located — a footgun on multi-process nodes). The off-chain prefetch removes that fan-out while keeping the small-memory win.Correctness / reorg safety
rebuilding(in-memory check disabled, or a reorg/startup rebuild in progress) the off-chain set is stale, so the route falls back to the per-txCheckBlockIsInCurrentChainpath, which has its own authoritative SQL fallback.CheckBlockIsInCurrentChainremains the lookup authority on a fast-path miss, so a block is only rejected after the store confirms none of a tx's parents are on chain. No accept/reject path changes.Measured impact (live A/B on betfair-pc mainnet,
LocalClient)Same binary, same fresh DB, two 30-second captures with the toggle on — phase A used the on-chain prefetch route (default), phase B used the off-chain route:
checkOldBlockIDsCPU shareGetBlockHeaderIDsCPU sharesql.GetBlockHeaderIDsinuseThe disappearance of
GetBlockHeaderIDsfrom the profile is the whole story — that one upfront fetch was the largest single CPU and memory cost on the box during catch-up. (These numbers are from a single-process node, i.e.LocalClient. The off-chain prefetch is what makes the win safe to take under a gRPC client too — one round-trip per block instead of the per-set fan-out.)What's in the diff
GetOffChainBlockIDsRPC across proto / store /Client/LocalClient/Server/ mocks. The store method returns the in-memory off-chain set, orrebuilding=truewhen it is unavailable/stale.services/blockvalidation/BlockValidation.go:checkOldBlockIDsWithoutPrefetchrewritten to prefetch the off-chain set once and resolve membership locally via an invertedfastPath, reusing the existing dedupe/iterate helper.rebuildingroutes to the per-tx fallback.blockchain_use_in_memory_chain_check=false(default) path is unchanged.Behaviour matrix
blockchain_use_in_memory_chain_checkfalse(default)true, store readyGetOffChainBlockIDs, resolved locally.true, store rebuildingCheckBlockIsInCurrentChainfallback (authoritative SQL underneath).Test plan
go build ./.../go vetclean;golangci-lintclean on changed packages.go test ./services/blockvalidation/...,./services/blockchain/...,./stores/blockchain/...pass.Test_checkOldBlockIDs_offChainPrefetch: all-on-chain resolved locally (no RPC), mixed parent set accepted, exclusively-off-chain confirmed via RPC then rejected,OffChainBlockIDserror →ServiceError.Test_checkOldBlockIDs_noPrefetchRoutenow drives the rebuilding-fallback sub-path.Test_checkOldBlockIDs_nilHeader: nil header / nilHashPrevBlockreturns a clean error, no panic (regression guard).truein a follow-up.What this PR doesn't do
CheckBlockIsInCurrentChain.