fix(blockvalidation): serialize setTxMined via setMinedChan worker#831
Conversation
|
🤖 Claude Code Review Status: Complete Current Review: I reviewed the three changed files:
The PR description acknowledges the >1000-block scenario as "acceptable for any realistic backlog" — but since this PR specifically targets a backlog/contention incident, a startup backlog exceeding 1000 is plausible, and the failure mode is a hard deadlock rather than graceful degradation. Severity: [Major] — edge-conditioned on >1000 pending blocks at startup, but a real regression with a hard-deadlock failure mode. |
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case u.setMinedChan <- blockHash: |
There was a problem hiding this comment.
[Critical] Potential deadlock during startup with >1000 pending blocks
The startup sequence has a subtle blocking issue:
- Line 470:
processBlockMinedNotSet(gCtx, g)is called, which now enqueues blocks ontosetMinedChan(lines 663-671) - Line 473:
g.Wait()blocks until all errgroup tasks complete - Lines 501-574: The
setMinedChanworker starts AFTERg.Wait()returns
The problem: If there are >1000 blocks to enqueue during startup, the channel buffer fills up and line 668 (u.setMinedChan <- blockHash) will block indefinitely, since the worker that drains the channel hasn't started yet. This creates a deadlock — processBlockMinedNotSet can't return, so g.Wait() never completes, so the worker never starts.
Current flow (problematic):
start() calls processBlockMinedNotSet(gCtx, g) // line 470
→ processBlockMinedNotSet tries to enqueue >1000 blocks
→ Channel fills, send blocks on line 668
g.Wait() waits for processBlockMinedNotSet to return // line 473
→ Never returns because line 668 is blocked
setMinedChan worker never starts // line 501
→ Channel never drains
Fix: Start the setMinedChan worker before calling processBlockMinedNotSet. Move lines 501-574 to execute before line 467.
Alternatively, launch processBlockMinedNotSet in a separate goroutine during startup (not using the errgroup), though this would deviate from the "synchronous startup recovery" pattern used by processSubtreesNotSet.
Evidence from PR description: The PR mentions this scenario is realistic — production had "7 blocks pending" with 387M txs, and the test plan includes deploying with pending blocks_mined_not_set to verify the queue drains. If each block gets re-queued during recovery and there are >1000 historical blocks, this deadlock will occur.
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-06-12 14:32 UTC |
Production observation: under load, multiple SetTxMined operations stack up running concurrently for 20+ minutes with zero completions. Goroutine profiles show every batch result blocked on a single sync.RWMutex.Lock() inside aerospike-client-go/v8's per-node nodeStats map, which serializes all batch result-code updates regardless of cluster size. The pre-existing setMinedChan worker already does setTxMinedStatus serially, including MinedSet guard, tryClaim dedup, retry-on-error, and cleanup. processBlockMinedNotSet was duplicating that logic in parallel goroutines spawned via errgroup.Go — at startup AND on every periodic ticker firing. Each parallel SetTxMined operation fans out into per- subtree workers and 1024-key Aerospike batches, multiplying contention on the client's nodeStats mutex into a thundering herd. Throughput collapses to near zero. Change: have processBlockMinedNotSet enqueue blocks onto setMinedChan instead of spawning goroutines. The worker dedups via its own MinedSet and tryClaim guards, so re-queuing in-flight or completed blocks is harmless. Honors ctx cancellation on the channel send. Net effect: - At most one setTxMined operation runs at a time per pod. - Eliminates duplicate claim/setTxMined/retry logic; -16 lines net. - The errgroup parameter is retained for caller-signature stability; it is no longer used by this function. Tradeoffs: - Startup setMined recovery is now async (returns to start() before completing). Existing callers don't depend on synchronous completion. - A backlog larger than setMinedChan capacity (1000) would block the ticker — same characteristic as the existing worker-error retry path.
GetAndValidateSubtrees derives TransactionCount from the subtree lengths, so a coinbase-only (empty) block has TransactionCount == 0. The mmap-backed maps introduced in #1053 reject FilterCapacity == 0, so with block_diskMapDirs configured checkDuplicateTransactions failed with 'DiskTxMapUint64: FilterCapacity must be > 0' on every empty block, halting validation of valid blocks. validOrderAndBlessed had the same problem sizing the parent-spends map (0 * multiplier = 0). Clamp the derived capacity to a floor of 1 at both call sites, mirroring the existing 'multiplier 0 is treated as 1' convention. The constructors keep their strict zero-capacity guard for direct misuse.
f51fec9 to
4d1f6d4
Compare
|





Summary
Hotfix companion to #828 / #830: serialize
SetTxMinedoperations on the block-validator side so they don't pile up and cause Aerospike-client mutex contention to collapse throughput to zero.Production evidence (
dev-scale-1-scale-1on commitfdaeadb7e)Block-validator pod was reported as "stopped doing setTxMined". It had not actually stopped — it had stacked up:
setTxMinedstart for block AsetTxMinedstart for block B (A still running)setTxMinedstart for block C (A, B still running, block ~387M txs)setTxMinedstart for block DLive pod state at the time:
aerospike-client-go/v8/internal/atomic/map.Map.Setonsync.RWMutex.Lock()— the per-nodenodeStatsmap mutexRoot cause
processBlockMinedNotSet(inBlockValidation.go) spawns one goroutine per block viaerrgroup.Gofor every block the blockchain reports as still needingmined_set. It runs:Each
SetTxMinedoperation fans out into per-subtree workers and 1024-key Aerospike batches. The Aerospike Go client serializes every batch result through a single per-nodeRWMutex.Lock()updating itsnodeStatshistogram. With multiple parallelSetTxMinedoperations layered on top, you get a thundering herd on that one mutex and effective throughput collapses.Meanwhile a separate
setMinedChanworker (BlockValidation.go:501) already exists and processes blocks serially — including theMinedSetguard,tryClaimBlockForSetMineddedup, retry-on-error, and cleanup. The polling path inprocessBlockMinedNotSetwas duplicating all of that logic in parallel goroutines, defeating the worker's serialization.Change
processBlockMinedNotSetnow enqueues block hashes ontosetMinedChaninstead of spawning goroutines. The worker handles everything else, naturally serializing operations.Net: 1 file, -50 / +34 lines (more deletions than additions because the duplicated worker logic collapses).
Behavior changes
SetTxMinedruns at a time per pod. This is the fix.start()returns before queued blocks have completed. No callers depend on synchronous startup completion (verified —NewBlockValidationrunsstart()in a fire-and-forget goroutine).errgroup.Groupparameter remains in the signature for caller-symmetry withprocessSubtreesNotSet(called from the same ticker), but it is unused — marked_to make that explicit.setMinedChancapacity is 1000; a backlog beyond that would block the ticker. The existing worker-error retry path already has the same characteristic, so this isn't a new failure mode.Why not just disable expressions / cut over the expressions path?
This fix is complementary to those, not a replacement. From the production analysis on PR #828:
aerospike_enable_setmined_filter_expressions— moves to the PR aerospike: optional native operate-path for mod-teranode UDFs #821 native-op-cutover'd path. Might reduce per-batch latency, but doesn't eliminate the parallel-thundering-herd.SetMinedMultiWithExpressionsto native ops — bigger change, future PR.aerospike-client-go/v8'snodeStatsto be lock-free — the real fix, separate effort.This PR addresses the amplifier (parallelism). Even if the per-op cost stays the same, serializing means:
SetTxMinedcompetes for the Aerospike mutex at a time.setMinedChan.Test plan
go build ./...— cleango vet ./services/blockvalidation/...— cleanTestBlockValidation*tests pass except one pre-existing timeout (TestBlockValidation_InvalidParentBlock, also fails on unmodified branch — unrelated)dev-scale-1and confirm:[setTxMined][...] setting tx minedlog lines pair with corresponding completionblocks_mined_not_setcount drains over timeRisks
BlockValidation.start()not returning until startup recovery completed, that assumption breaks. Mitigation: I checked the only caller (NewBlockValidation) and it runsstart()in a fire-and-forget goroutine.g.Wait()returns. Acceptable for any realistic backlog.Targeting
Base:
feat/teranode-native-ops(where the deployed pods are running). Will need to also apply tomainonce #828 merges.