Skip to content

fix(blockvalidation): serialize setTxMined via setMinedChan worker#831

Merged
icellan merged 2 commits into
feat/teranode-native-opsfrom
fix/serialize-set-tx-mined
Jun 12, 2026
Merged

fix(blockvalidation): serialize setTxMined via setMinedChan worker#831
icellan merged 2 commits into
feat/teranode-native-opsfrom
fix/serialize-set-tx-mined

Conversation

@icellan

@icellan icellan commented May 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Hotfix companion to #828 / #830: serialize SetTxMined operations on the block-validator side so they don't pile up and cause Aerospike-client mutex contention to collapse throughput to zero.

Production evidence (dev-scale-1-scale-1 on commit fdaeadb7e)

Block-validator pod was reported as "stopped doing setTxMined". It had not actually stopped — it had stacked up:

Time (UTC) Event
08:55:57 setTxMined start for block A
09:01:57 setTxMined start for block B (A still running)
09:08:57 setTxMined start for block C (A, B still running, block ~387M txs)
09:15:57 setTxMined start for block D
09:17:57 7 blocks pending; no DONE log for any in 22 min

Live pod state at the time:

  • 5.68 CPU cores, 107 GB live heap
  • pprof: many goroutines blocked in aerospike-client-go/v8/internal/atomic/map.Map.Set on sync.RWMutex.Lock() — the per-node nodeStats map mutex
  • All 6 Aerospike pods healthy, no errors in client logs

Root cause

processBlockMinedNotSet (in BlockValidation.go) spawns one goroutine per block via errgroup.Go for every block the blockchain reports as still needing mined_set. It runs:

  • once at startup (recovery)
  • every minute via the periodic ticker

Each SetTxMined operation fans out into per-subtree workers and 1024-key Aerospike batches. The Aerospike Go client serializes every batch result through a single per-node RWMutex.Lock() updating its nodeStats histogram. With multiple parallel SetTxMined operations layered on top, you get a thundering herd on that one mutex and effective throughput collapses.

Meanwhile a separate setMinedChan worker (BlockValidation.go:501) already exists and processes blocks serially — including the MinedSet guard, tryClaimBlockForSetMined dedup, retry-on-error, and cleanup. The polling path in processBlockMinedNotSet was duplicating all of that logic in parallel goroutines, defeating the worker's serialization.

Change

processBlockMinedNotSet now enqueues block hashes onto setMinedChan instead of spawning goroutines. The worker handles everything else, naturally serializing operations.

- for _, block := range blocksMinedNotSet {
-     blockHash := block.Hash()
-     if !u.tryClaimBlockForSetMined(blockHash) { continue }
-     g.Go(func() error {
-         defer func() { u.blockHashesCurrentlyValidated.Delete(*blockHash) }()
-         // ... fetch header, call setTxMinedStatus, retry on err ...
-     })
- }
+ for _, block := range blocksMinedNotSet {
+     blockHash := block.Hash()
+     select {
+     case <-ctx.Done():
+         return
+     case u.setMinedChan <- blockHash:
+     }
+ }

Net: 1 file, -50 / +34 lines (more deletions than additions because the duplicated worker logic collapses).

Behavior changes

  • At most one SetTxMined runs at a time per pod. This is the fix.
  • Startup recovery is now asynchronous — start() returns before queued blocks have completed. No callers depend on synchronous startup completion (verified — NewBlockValidation runs start() in a fire-and-forget goroutine).
  • The errgroup.Group parameter remains in the signature for caller-symmetry with processSubtreesNotSet (called from the same ticker), but it is unused — marked _ to make that explicit.
  • setMinedChan capacity is 1000; a backlog beyond that would block the ticker. The existing worker-error retry path already has the same characteristic, so this isn't a new failure mode.

Why not just disable expressions / cut over the expressions path?

This fix is complementary to those, not a replacement. From the production analysis on PR #828:

  1. Disable aerospike_enable_setmined_filter_expressions — moves to the PR aerospike: optional native operate-path for mod-teranode UDFs #821 native-op-cutover'd path. Might reduce per-batch latency, but doesn't eliminate the parallel-thundering-herd.
  2. Cut over SetMinedMultiWithExpressions to native ops — bigger change, future PR.
  3. (Upstream) patch aerospike-client-go/v8's nodeStats to be lock-free — the real fix, separate effort.

This PR addresses the amplifier (parallelism). Even if the per-op cost stays the same, serializing means:

  • Only 1 SetTxMined competes for the Aerospike mutex at a time.
  • Memory stays bounded (no piled-up batch records — we observed 107 GB heap from accumulated parallel state).
  • Operations complete in deterministic order — which matters because they retry on failure by re-queuing into setMinedChan.

Test plan

  • go build ./... — clean
  • go vet ./services/blockvalidation/... — clean
  • Pre-commit hooks (gofmt, gci, golangci-lint) — pass
  • Existing TestBlockValidation* tests pass except one pre-existing timeout (TestBlockValidation_InvalidParentBlock, also fails on unmodified branch — unrelated)
  • Deploy to dev-scale-1 and confirm:
    • Block-validator memory stays bounded (no 107 GB heaps)
    • [setTxMined][...] setting tx mined log lines pair with corresponding completion
    • Pending blocks_mined_not_set count drains over time

Risks

  • Async startup: if some operator or test depended on BlockValidation.start() not returning until startup recovery completed, that assumption breaks. Mitigation: I checked the only caller (NewBlockValidation) and it runs start() in a fire-and-forget goroutine.
  • Channel buffer overflow: at startup with >1000 pending blocks, the loop blocks until the worker drains. Worker starts shortly after g.Wait() returns. Acceptable for any realistic backlog.
  • Rollback: pure revert of one commit.

Targeting

Base: feat/teranode-native-ops (where the deployed pods are running). Will need to also apply to main once #828 merges.

@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:

I reviewed the three changed files: model/Block.go, model/Block_test.go, and services/blockvalidation/BlockValidation.go.

  • model/Block.go (capacity clamp) — Correct. Clamping FilterCapacity/expectedInpoints to a floor of 1 for empty (coinbase-only, TransactionCount == 0) blocks is sound, since the mmap-backed maps reject a zero capacity. The new TestBlock_EmptyBlock_DiskMapDirs covers in-memory, single-disk, and multi-disk paths. No issues.

  • BlockValidation.go (serialize via setMinedChan) — The serialization approach is reasonable, and the worker already owns the dedup/retry/cleanup lifecycle. However, one inline finding remains valid and unresolved: during startup processBlockMinedNotSet (line 516) does blocking sends into the 1000-capacity setMinedChan before the draining worker is started (line 550, after g.Wait() at line 519). With >1000 blocks returned by GetBlocksMinedNotSet (the SQL query has no LIMIT), the channel fills, the send blocks forever, start() never reaches the worker launch, and startup deadlocks. This is a regression vs. the old code, where the happy path called setTxMinedStatus directly and only error paths touched the channel. See the inline comment for the suggested fix (start the worker before enqueuing, or enqueue from a goroutine).

The PR description acknowledges the >1000-block scenario as "acceptable for any realistic backlog" — but since this PR specifically targets a backlog/contention incident, a startup backlog exceeding 1000 is plausible, and the failure mode is a hard deadlock rather than graceful degradation.

Severity: [Major] — edge-conditioned on >1000 pending blocks at startup, but a real regression with a hard-deadlock failure mode.

select {
case <-ctx.Done():
return
case u.setMinedChan <- blockHash:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Potential deadlock during startup with >1000 pending blocks

The startup sequence has a subtle blocking issue:

  1. Line 470: processBlockMinedNotSet(gCtx, g) is called, which now enqueues blocks onto setMinedChan (lines 663-671)
  2. Line 473: g.Wait() blocks until all errgroup tasks complete
  3. Lines 501-574: The setMinedChan worker starts AFTER g.Wait() returns

The problem: If there are >1000 blocks to enqueue during startup, the channel buffer fills up and line 668 (u.setMinedChan <- blockHash) will block indefinitely, since the worker that drains the channel hasn't started yet. This creates a deadlock — processBlockMinedNotSet can't return, so g.Wait() never completes, so the worker never starts.

Current flow (problematic):

start() calls processBlockMinedNotSet(gCtx, g)  // line 470
  → processBlockMinedNotSet tries to enqueue >1000 blocks
  → Channel fills, send blocks on line 668
g.Wait() waits for processBlockMinedNotSet to return  // line 473
  → Never returns because line 668 is blocked
setMinedChan worker never starts  // line 501
  → Channel never drains

Fix: Start the setMinedChan worker before calling processBlockMinedNotSet. Move lines 501-574 to execute before line 467.

Alternatively, launch processBlockMinedNotSet in a separate goroutine during startup (not using the errgroup), though this would deviate from the "synchronous startup recovery" pattern used by processSubtreesNotSet.

Evidence from PR description: The PR mentions this scenario is realistic — production had "7 blocks pending" with 387M txs, and the test plan includes deploying with pending blocks_mined_not_set to verify the queue drains. If each block gets re-queued during recovery and there are >1000 historical blocks, this deadlock will occur.

@sonarqubecloud

sonarqubecloud Bot commented May 8, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
60.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-831 (fa0657a)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 138
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.242µ 1.230µ ~ 0.200
Block_ValidOrderAndBlessed_DiskVsMemory/leaves=1024/memory-4 11.77m 11.82m ~ 0.100
Block_ValidOrderAndBlessed_DiskVsMemory/leaves=1024/disk_1-4 11.79m 11.75m ~ 0.100
Block_ValidOrderAndBlessed_DiskVsMemory/leaves=1024/disk_2-4 11.75m 12.20m ~ 0.200
Block_ValidOrderAndBlessed_DiskVsMemory/leaves=16384/memo... 24.65m 25.01m ~ 0.100
Block_ValidOrderAndBlessed_DiskVsMemory/leaves=16384/disk... 31.07m 31.08m ~ 1.000
Block_ValidOrderAndBlessed_DiskVsMemory/leaves=16384/disk... 30.92m 31.39m ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 54.79n 54.76n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 54.83n 54.81n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 54.83n 54.84n ~ 0.800
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 24.94n 25.38n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 43.38n 43.41n ~ 0.500
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 93.54n 95.15n ~ 0.100
MiningCandidate_Stringify_Short-4 158.9n 155.8n ~ 0.100
MiningCandidate_Stringify_Long-4 1.075µ 1.069µ ~ 0.400
MiningSolution_Stringify-4 569.3n 570.2n ~ 1.000
BlockInfo_MarshalJSON-4 1.221µ 1.238µ ~ 0.100
NewFromBytes-4 126.4n 125.6n ~ 0.700
AddTxBatchColumnar_Validation-4 2.452µ 2.613µ ~ 0.200
OffsetValidationLoop-4 718.5n 543.6n ~ 0.100
Mine_EasyDifficulty-4 66.91µ 69.12µ ~ 0.100
Mine_WithAddress-4 7.832µ 7.130µ ~ 0.700
BlockAssembler_AddTx-4 0.02819n 0.02950n ~ 0.700
AddNode-4 10.88 11.16 ~ 0.700
AddNodeWithMap-4 10.86 11.75 ~ 0.200
DirectSubtreeAdd/4_per_subtree-4 57.93n 62.53n ~ 0.100
DirectSubtreeAdd/64_per_subtree-4 29.45n 31.99n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 27.80n 30.62n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 26.59n 29.42n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 26.35n 29.00n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 247.9n 242.8n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 241.1n 239.1n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 241.2n 237.1n ~ 0.200
SubtreeProcessorAdd/1024_per_subtree-4 233.6n 235.2n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 234.4n 234.1n ~ 0.700
SubtreeProcessorRotate/4_per_subtree-4 234.6n 237.7n ~ 0.200
SubtreeProcessorRotate/64_per_subtree-4 231.5n 233.0n ~ 0.800
SubtreeProcessorRotate/256_per_subtree-4 234.2n 232.9n ~ 0.700
SubtreeProcessorRotate/1024_per_subtree-4 232.5n 232.7n ~ 0.400
SubtreeNodeAddOnly/4_per_subtree-4 54.73n 55.56n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 34.55n 34.94n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.88n 33.90n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 33.23n 32.93n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 116.9n 115.9n ~ 0.700
SubtreeCreationOnly/64_per_subtree-4 409.5n 412.6n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.399µ 1.472µ ~ 0.700
SubtreeCreationOnly/1024_per_subtree-4 4.536µ 4.475µ ~ 0.700
SubtreeCreationOnly/2048_per_subtree-4 8.532µ 8.492µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 233.9n 233.3n ~ 0.800
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 235.1n 232.4n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 9.124m 10.685m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 14.48m 13.30m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 17.61m 16.55m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 19.69m 20.57m ~ 0.400
SequentialGetAndSetIfNotExists/1k_nodes-4 9.002m 10.635m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 13.22m 15.26m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 22.32m 25.21m ~ 0.200
SequentialGetAndSetIfNotExists/100k_nodes-4 29.13m 30.28m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 9.327m 12.986m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 17.36m 18.38m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 21.33m 20.57m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 12.12m 13.34m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 16.31m 18.01m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 54.80m 64.16m ~ 0.100
DiskTxMap_SetIfNotExists-4 3.304µ 3.283µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.136µ 3.495µ ~ 0.100
DiskTxMap_ExistenceOnly-4 315.8n 321.6n ~ 1.000
Queue-4 150.2n 149.8n ~ 1.000
AtomicPointer-4 2.504n 2.930n ~ 0.100
TxMapSetIfNotExists-4 38.06n 38.08n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 31.93n 31.85n ~ 0.100
ChannelSendReceive-4 419.2n 418.5n ~ 1.000
CalcBlockWork-4 527.4n 525.1n ~ 1.000
CalculateWork-4 712.5n 704.7n ~ 0.100
CheckOldBlockIDs/on-chain-prefetch/1000-4 59.54µ 58.44µ ~ 0.100
CheckOldBlockIDs/off-chain-prefetch/1000-4 48.11µ 52.00µ ~ 0.400
CheckOldBlockIDs/on-chain-prefetch/10000-4 430.5µ 430.8µ ~ 1.000
CheckOldBlockIDs/off-chain-prefetch/10000-4 342.3µ 346.1µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_10-4 1.384µ 1.370µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 13.28µ 13.09µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 131.2µ 130.3µ ~ 0.400
CatchupWithHeaderCache-4 104.7m 104.6m ~ 1.000
_BufferPoolAllocation/16KB-4 5.481µ 4.147µ ~ 0.200
_BufferPoolAllocation/32KB-4 9.646µ 10.480µ ~ 0.100
_BufferPoolAllocation/64KB-4 19.22µ 19.54µ ~ 1.000
_BufferPoolAllocation/128KB-4 36.79µ 31.19µ ~ 0.100
_BufferPoolAllocation/512KB-4 135.1µ 112.1µ ~ 0.100
_BufferPoolConcurrent/32KB-4 21.43µ 20.04µ ~ 0.200
_BufferPoolConcurrent/64KB-4 30.77µ 33.52µ ~ 0.100
_BufferPoolConcurrent/512KB-4 152.2µ 156.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 688.2µ 668.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 650.9µ 637.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 651.9µ 636.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 664.6µ 640.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 689.9µ 653.9µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.45m 36.24m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4 35.94m 36.65m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.17m 36.30m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.25m 36.00m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.15m 36.16m ~ 1.000
_PooledVsNonPooled/Pooled-4 741.8n 744.7n ~ 0.400
_PooledVsNonPooled/NonPooled-4 8.626µ 7.891µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.771µ 7.153µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.09µ 11.34µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.841µ 11.528µ ~ 0.100
_prepareTxsPerLevel-4 411.9m 407.9m ~ 1.000
_prepareTxsPerLevelOrdered-4 3.731m 4.225m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 403.6m 404.6m ~ 1.000
_prepareTxsPerLevel_Comparison/Optimized-4 3.857m 3.799m ~ 0.400
SubtreeSizes/10k_tx_4_per_subtree-4 1.425m 1.392m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 333.8µ 326.0µ ~ 0.100
SubtreeSizes/10k_tx_64_per_subtree-4 82.69µ 78.68µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 20.37µ 19.87µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 10.240µ 9.777µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.965µ 4.902µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.514µ 2.423µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 79.27µ 77.13µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 20.30µ 19.61µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 5.023µ 4.826µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 398.2µ 406.4µ ~ 0.200
BlockSizeScaling/50k_tx_256_per_subtree-4 100.74µ 96.42µ ~ 0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4 25.03µ 24.20µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 165.1µ 159.3µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 172.3µ 168.3µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 328.1µ 323.2µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 9.998µ 9.873µ ~ 0.200
SubtreeAllocations/medium_subtrees_data_fetch-4 10.86µ 10.46µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 20.63µ 20.22µ ~ 0.200
SubtreeAllocations/large_subtrees_exists_check-4 2.482µ 2.439µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.686µ 2.606µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 5.202µ 5.001µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 334.8µ 339.3µ ~ 0.100
StoreBlock_Sequential/AboveCSVHeight-4 338.0µ 352.8µ ~ 0.100
GetUtxoHashes-4 266.2n 273.0n ~ 0.700
GetUtxoHashes_ManyOutputs-4 49.36µ 43.44µ ~ 0.100
_NewMetaDataFromBytes-4 227.7n 229.0n ~ 0.300
_Bytes-4 405.2n 400.6n ~ 0.100
_MetaBytes-4 139.0n 138.3n ~ 0.500

Threshold: >10% with p < 0.05 | Generated: 2026-06-12 14:32 UTC

@icellan icellan self-assigned this Jun 10, 2026
icellan added 2 commits June 12, 2026 15:27
Production observation: under load, multiple SetTxMined operations stack up
running concurrently for 20+ minutes with zero completions. Goroutine
profiles show every batch result blocked on a single sync.RWMutex.Lock()
inside aerospike-client-go/v8's per-node nodeStats map, which serializes
all batch result-code updates regardless of cluster size.

The pre-existing setMinedChan worker already does setTxMinedStatus
serially, including MinedSet guard, tryClaim dedup, retry-on-error, and
cleanup. processBlockMinedNotSet was duplicating that logic in parallel
goroutines spawned via errgroup.Go — at startup AND on every periodic
ticker firing. Each parallel SetTxMined operation fans out into per-
subtree workers and 1024-key Aerospike batches, multiplying contention
on the client's nodeStats mutex into a thundering herd. Throughput
collapses to near zero.

Change: have processBlockMinedNotSet enqueue blocks onto setMinedChan
instead of spawning goroutines. The worker dedups via its own MinedSet
and tryClaim guards, so re-queuing in-flight or completed blocks is
harmless. Honors ctx cancellation on the channel send.

Net effect:
- At most one setTxMined operation runs at a time per pod.
- Eliminates duplicate claim/setTxMined/retry logic; -16 lines net.
- The errgroup parameter is retained for caller-signature stability;
  it is no longer used by this function.

Tradeoffs:
- Startup setMined recovery is now async (returns to start() before
  completing). Existing callers don't depend on synchronous completion.
- A backlog larger than setMinedChan capacity (1000) would block the
  ticker — same characteristic as the existing worker-error retry path.
GetAndValidateSubtrees derives TransactionCount from the subtree lengths,
so a coinbase-only (empty) block has TransactionCount == 0. The mmap-backed
maps introduced in #1053 reject FilterCapacity == 0, so with
block_diskMapDirs configured checkDuplicateTransactions failed with
'DiskTxMapUint64: FilterCapacity must be > 0' on every empty block,
halting validation of valid blocks. validOrderAndBlessed had the same
problem sizing the parent-spends map (0 * multiplier = 0).

Clamp the derived capacity to a floor of 1 at both call sites, mirroring
the existing 'multiplier 0 is treated as 1' convention. The constructors
keep their strict zero-capacity guard for direct misuse.
@icellan icellan force-pushed the fix/serialize-set-tx-mined branch from f51fec9 to 4d1f6d4 Compare June 12, 2026 14:17
@sonarqubecloud

Copy link
Copy Markdown

@icellan icellan merged commit f683d6e into feat/teranode-native-ops Jun 12, 2026
34 checks passed
@icellan icellan deleted the fix/serialize-set-tx-mined branch June 12, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants