fix(blockassembly): purge-conflicting-unmined + FSM IDLE enforcement by icellan · Pull Request #704 · bsv-blockchain/teranode

icellan · 2026-04-15T13:04:54Z

Summary

Fixes BlockAssembler startup failures caused by unmined transactions in a locally-inconsistent state (Conflicting=true + UnminedSince>0 records, plus non-conflicting children referencing them). The iterator filters Conflicting=true, so the parent is absent from the processing list and validateParentChain rejects the child, parking the FSM in IDLE.

The branch started out as a repair tool that tried to reconstruct intended state via classification (Case A / C / D). Every iteration on mainnet uncovered a new graph shape and either added hours of runtime or still left the offending tx stuck — because the writers (SetConflicting, reorg handlers) don't clean up after themselves (stale conflictingCs, nil SpendingData, etc).

Replaced with a surgical purge: the unmined set is ephemeral by design (propagation re-arrives valid txs; next block sweeps them up), so a single-pass delete of every (Conflicting=true, UnminedSince>0) record is all BA needs to start cleanly. validateParentChain was relaxed to tolerate missing parents — non-conflicting children whose parents just got purged are harmlessly skipped, and get mined or pruned on their own.

New: `teranode-cli purge-conflicting-unmined`

Online CLI, run while the node is up and the FSM has parked in IDLE:

teranode-cli purge-conflicting-unmined [--dry-run] [--skip-unmined-since-scan]

Step 0 — full-store consistency scan, re-marks mined-on-best-chain txs that still carry UnminedSince. Reused from the repair era. --skip-unmined-since-scan skips it on re-runs once it has completed cleanly.
Step 1 — same single scan (no second pass) collects every record with Conflicting=true + UnminedSince>0.
Step 2 — batch Delete over the collected set. Aerospike record goes; external .tx/.outputs blobs are cleaned by the existing pruner on delete_at_height.
--dry-run counts without writing.

Dropped ~850 lines of classification / cascade / chase-up / cache machinery.

FSM IDLE enforcement (from earlier work on this branch)

validateParentChain sets the FSM to IDLE when it detects integrity problems. Four service hot paths early-return when FSM is IDLE so no new work reaches a half-initialised block assembler:

Service	Function
Block validation	`blockHandler` (Kafka consumer)
Subtree validation	`CheckSubtreeFromBlock`
Propagation	`processTransaction`
Legacy	`HandleBlockDirect`

The blockchain FSM now accepts STOP from CATCHINGBLOCKS so a repair-needed error detected mid-catchup can actually park the node in IDLE (previously the transition was rejected and the node crash-looped).

Block assembly freezes its gRPC entry points via an atomic frozenForRepair flag and spawns a watcher goroutine that retries loadUnminedTransactions the next time the FSM leaves IDLE — so after the purge completes the operator flips the FSM out of IDLE and BA resumes live without a node restart.

Operator flow

BA startup hits validateParentChain → trips on parent is unmined but not in processing list → FSM → IDLE. Log: Run 'teranode-cli purge-conflicting-unmined' to fix.
Operator runs teranode-cli purge-conflicting-unmined --skip-unmined-since-scan (first run without the skip flag, subsequent iterations with).
Operator calls teranode-cli setfsmstate --fsmstate RUNNING. BA's watcher retries, parent is gone, child is harmlessly null-skipped, BA unfreezes.
Mempool state re-populates over propagation.

Test plan

Purge suite (stores/utxo/tests/purge_conflicting_unmined_test.go):

TestPurgeConflictingUnmined_CleanState — empty store yields zeroed report
TestPurgeConflictingUnmined_DeletesConflictingUnmined — (Conflicting=true, UnminedSince>0) record deleted
TestPurgeConflictingUnmined_LeavesNonConflictingUnminedAlone — child with dangling parent ref untouched
TestPurgeConflictingUnmined_LeavesMinedTxAlone — UnminedSince=0 records protected
TestPurgeConflictingUnmined_DryRun — candidates counted, no writes
TestPurgeConflictingUnmined_SkipUnminedSinceScan — step 0 skipped, steps 1+2 still run
TestPurgeConflictingUnmined_UnminedSinceFix — step 0 clears stray UnminedSince on mined-on-best-chain
TestPurgeConflictingUnmined_Idempotent — second run finds nothing
TestPurgeConflictingUnmined_DeleteForwardsThroughStore — store wrapper sees Delete per purged hash (TxMetaCache eviction hook)

Infrastructure:

TestValidateParentChain_* — parent-missing case now returns nil-tolerant (post-purge expected shape), all other integrity checks still trip idleAndError
Test_NewFiniteStateMachine — STOP from CATCHINGBLOCKS allowed
All propagation, subtreevalidation, legacy/netsync, blockassembly unit tests pass (553 in blockassembly packages alone)

Commits in the purge pivot

b510ebc00 — rename files + exports (repair_conflicts.go → purge_conflicting_unmined.go, RepairConflictingChains → PurgeConflictingUnmined, etc.). Pure rename, no logic change.
eea0468b2 — replace Case A/C/D classification with surgical purge. Extends InconsistentTxRecord with Conflicting bool so one scan seeds both step 0 and step 1. SQL ScanInconsistentUnminedTxs now implemented (was a no-op). validateParentChain skips missing parents.
701287ebf — 9-test purge suite.
3aebb81e6 — CLI subcommand rename, 11 operator-facing strings, settings doc, new docs/howto/recovery-from-idle.md runbook.

Earlier commits in the branch (repair era) are kept rather than rebased away — full history is useful post-mortem on why the classification approach was abandoned.

Not addressed / out of scope

Orphaned subtree blobs in subtree-store — existing pruner concern, not touched.
External .tx/.outputs blob cleanup — handled by existing pruner on delete_at_height.
Stale ConflictingChildren back-refs on purged parents — unread by any consumer after purge; no-op.
ErrRepairNeeded error type kept as-is (neutral semantics, ripple cost too high for marginal clarity).

…tection

…nvalid chains Update test for in-memory sort path to use no-input transactions so validateParentChain passes trivially, since the function now hard-fails instead of silently filtering transactions with unknown parents.

…rentChain - stores/utxo/tests/repair_conflicts_test.go: 5 tests exercising RepairConflictingChains with a file-based SQLite store (WAL mode required to avoid SetConflicting deadlock in the SQL store). Covers clean state, Case A detection+fix, cascade to children, dry run, and step-0 UnminedSince no-op for SQL. - services/blockassembly/validate_parent_chain_test.go: 3 tests for validateParentChain using a real sqlitememory UTXO store and blockchain.Mock for FSM event assertions. Covers hard-fail+FSMEventIDLE on unmined parent not in list, clean mined-parent success, and success when unmined parent precedes child in the processing list.

…ests

…ion, propagation, legacy

github-actions · 2026-04-15T13:05:57Z

🤖 Claude Code Review

Status: Complete

Summary

This PR implements a surgical cleanup mechanism for inconsistent unmined transaction state that causes BlockAssembler startup failures. The approach is sound: instead of attempting complex state reconstruction, it deletes ephemeral unmined records that can be re-propagated, while preserving mined data. The implementation is well-tested, thoroughly documented, and integrates cleanly with the existing FSM IDLE infrastructure.

Findings

No critical issues found. The implementation demonstrates good engineering practices:

Strengths

Clear separation of concerns: Cleanup logic (stores/utxo/cleanup_unmined.go) is decoupled from command interface (cmd/cleanupunmined/) and service integration
Comprehensive test coverage: 9 tests covering clean state, deletion logic, dry-run, idempotency, and store integration
Accurate documentation: recovery-from-idle.md correctly documents the operator workflow and safety guarantees
Robust FSM integration: All four entry points (blockvalidation, subtreevalidation, propagation, legacy) correctly check FSM IDLE and pause processing
Live repair without restart: watchForRepairCompletion goroutine enables resume after cleanup without node restart
Safe deletion policy: Missing parents are tolerated post-cleanup; orphan-parent classification uses BA's CurrentBlock as anchor to prevent chain-drift issues

Documentation Accuracy Verified

Line 56 claim about SubtreeValidation.processMissingTransactions refetch safety: Accurate (services/subtreevalidation/processTxMetaUsingStore.go:140-148 treats TX_NOT_FOUND as miss counter, not fatal)
recovery-from-idle.md operator flow: Accurate (matches BlockAssembler.watchForRepairCompletion implementation)
blockassembly_settings.md FSM IDLE guidance: Accurate (teranode-cli cleanup-unmined is correct command name)

Architecture Notes

Three-step cleanup (unmined_since fix, conflicting-unmined purge, orphan-parent deletion) is well-justified in PR description
blockchainAdapter uses BA's CurrentBlock rather than blockchain service's best chain—correct choice to avoid reorg drift
SQL iterator implementation (stores/utxo/sql/unmined_iterator.go) matches Aerospike 1024-record batch size for consistency

Review Complete

No inline comments required. All previously reported issues have been resolved.

github-actions · 2026-04-15T13:19:58Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-704 (aed97f2)

Summary

Regressions: 0
Improvements: 0
Unchanged: 142
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.671µ	1.697µ	~	0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	59.42n	59.51n	~	1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	62.47n	59.40n	~	0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	59.34n	59.36n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	33.83n	34.14n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	59.43n	57.71n	~	0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	146.3n	149.2n	~	0.100
MiningCandidate_Stringify_Short-4	248.5n	247.0n	~	0.200
MiningCandidate_Stringify_Long-4	1.714µ	1.720µ	~	0.200
MiningSolution_Stringify-4	888.1n	889.6n	~	0.700
BlockInfo_MarshalJSON-4	1.713µ	1.718µ	~	1.000
NewFromBytes-4	129.2n	142.8n	~	0.100
Mine_EasyDifficulty-4	58.43µ	58.49µ	~	0.700
Mine_WithAddress-4	4.749µ	4.791µ	~	0.700
BlockAssembler_AddTx-4	0.02909n	0.03044n	~	0.400
AddNode-4	11.69	11.29	~	0.100
AddNodeWithMap-4	11.72	11.46	~	0.100
DirectSubtreeAdd/4_per_subtree-4	60.10n	61.78n	~	0.400
DirectSubtreeAdd/64_per_subtree-4	31.14n	28.63n	~	0.100
DirectSubtreeAdd/256_per_subtree-4	30.44n	27.07n	~	0.100
DirectSubtreeAdd/1024_per_subtree-4	28.91n	26.09n	~	0.100
DirectSubtreeAdd/2048_per_subtree-4	28.53n	25.75n	~	0.100
SubtreeProcessorAdd/4_per_subtree-4	311.1n	315.7n	~	0.100
SubtreeProcessorAdd/64_per_subtree-4	306.1n	317.4n	~	0.100
SubtreeProcessorAdd/256_per_subtree-4	308.0n	320.8n	~	0.100
SubtreeProcessorAdd/1024_per_subtree-4	310.0n	316.4n	~	0.200
SubtreeProcessorAdd/2048_per_subtree-4	307.8n	316.3n	~	0.200
SubtreeProcessorRotate/4_per_subtree-4	311.9n	316.7n	~	1.000
SubtreeProcessorRotate/64_per_subtree-4	314.9n	317.8n	~	0.300
SubtreeProcessorRotate/256_per_subtree-4	317.8n	319.1n	~	1.000
SubtreeProcessorRotate/1024_per_subtree-4	309.6n	302.2n	~	0.100
SubtreeNodeAddOnly/4_per_subtree-4	67.37n	66.61n	~	0.200
SubtreeNodeAddOnly/64_per_subtree-4	40.16n	39.01n	~	0.100
SubtreeNodeAddOnly/256_per_subtree-4	37.62n	37.72n	~	1.000
SubtreeNodeAddOnly/1024_per_subtree-4	37.11n	37.24n	~	0.400
SubtreeCreationOnly/4_per_subtree-4	165.7n	165.2n	~	1.000
SubtreeCreationOnly/64_per_subtree-4	634.8n	631.9n	~	0.400
SubtreeCreationOnly/256_per_subtree-4	2.010µ	2.003µ	~	0.700
SubtreeCreationOnly/1024_per_subtree-4	5.206µ	5.204µ	~	0.700
SubtreeCreationOnly/2048_per_subtree-4	8.211µ	9.307µ	~	0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	308.8n	304.7n	~	0.200
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	309.4n	309.5n	~	0.700
ParallelGetAndSetIfNotExists/1k_nodes-4	958.6µ	969.7µ	~	0.200
ParallelGetAndSetIfNotExists/10k_nodes-4	1.932m	1.903m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	8.872m	8.640m	~	0.200
ParallelGetAndSetIfNotExists/100k_nodes-4	17.62m	17.36m	~	0.200
SequentialGetAndSetIfNotExists/1k_nodes-4	765.1µ	753.8µ	~	0.100
SequentialGetAndSetIfNotExists/10k_nodes-4	2.958m	2.935m	~	0.700
SequentialGetAndSetIfNotExists/50k_nodes-4	10.88m	10.83m	~	0.700
SequentialGetAndSetIfNotExists/100k_nodes-4	20.64m	20.34m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	1.042m	1.016m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	4.737m	4.695m	~	0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	19.40m	18.94m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	840.6µ	830.4µ	~	0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	6.027m	6.035m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	39.86m	39.53m	~	0.100
DiskTxMap_SetIfNotExists-4	3.462µ	3.601µ	~	1.000
DiskTxMap_SetIfNotExists_Parallel-4	3.311µ	3.289µ	~	0.100
DiskTxMap_ExistenceOnly-4	298.6n	298.3n	~	1.000
Queue-4	194.5n	191.7n	~	0.100
AtomicPointer-4	4.901n	4.883n	~	1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	847.6µ	843.1µ	~	1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4	813.1µ	817.2µ	~	1.000
ReorgOptimizations/AllMarkFalse/Old/10K-4	115.0µ	113.2µ	~	0.700
ReorgOptimizations/AllMarkFalse/New/10K-4	62.02µ	61.71µ	~	0.100
ReorgOptimizations/HashSlicePool/Old/10K-4	68.09µ	72.07µ	~	0.700
ReorgOptimizations/HashSlicePool/New/10K-4	11.40µ	11.44µ	~	1.000
ReorgOptimizations/NodeFlags/Old/10K-4	5.522µ	6.130µ	~	0.100
ReorgOptimizations/NodeFlags/New/10K-4	1.809µ	2.472µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	9.460m	10.136m	~	0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4	9.500m	10.088m	~	0.700
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.119m	1.179m	~	0.200
ReorgOptimizations/AllMarkFalse/New/100K-4	679.5µ	681.5µ	~	1.000
ReorgOptimizations/HashSlicePool/Old/100K-4	711.2µ	666.1µ	~	0.100
ReorgOptimizations/HashSlicePool/New/100K-4	306.4µ	338.2µ	~	0.100
ReorgOptimizations/NodeFlags/Old/100K-4	52.57µ	56.32µ	~	0.100
ReorgOptimizations/NodeFlags/New/100K-4	19.69µ	19.41µ	~	0.700
TxMapSetIfNotExists-4	51.33n	51.54n	~	0.400
TxMapSetIfNotExistsDuplicate-4	38.53n	37.91n	~	0.700
ChannelSendReceive-4	621.6n	589.6n	~	0.100
CalcBlockWork-4	468.2n	469.8n	~	0.400
CalculateWork-4	631.6n	632.5n	~	1.000
BuildBlockLocatorString_Helpers/Size_10-4	1.652µ	1.618µ	~	1.000
BuildBlockLocatorString_Helpers/Size_100-4	12.37µ	12.52µ	~	0.100
BuildBlockLocatorString_Helpers/Size_1000-4	123.0µ	122.3µ	~	0.700
CatchupWithHeaderCache-4	104.2m	104.1m	~	0.700
_BufferPoolAllocation/16KB-4	3.349µ	3.437µ	~	0.100
_BufferPoolAllocation/32KB-4	7.429µ	8.171µ	~	0.100
_BufferPoolAllocation/64KB-4	16.82µ	16.59µ	~	0.700
_BufferPoolAllocation/128KB-4	28.21µ	32.63µ	~	0.100
_BufferPoolAllocation/512KB-4	111.7µ	106.4µ	~	0.100
_BufferPoolConcurrent/32KB-4	19.15µ	19.23µ	~	1.000
_BufferPoolConcurrent/64KB-4	30.81µ	29.94µ	~	0.400
_BufferPoolConcurrent/512KB-4	147.0µ	147.6µ	~	0.400
_SubtreeDeserializationWithBufferSizes/16KB-4	619.3µ	631.2µ	~	0.100
_SubtreeDeserializationWithBufferSizes/32KB-4	611.0µ	626.5µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	610.8µ	620.5µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	597.5µ	619.3µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	625.5µ	629.0µ	~	0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4	35.04m	34.96m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4	34.57m	35.04m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4	34.61m	34.77m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4	34.68m	35.19m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4	34.77m	34.58m	~	0.700
_PooledVsNonPooled/Pooled-4	736.5n	737.6n	~	0.700
_PooledVsNonPooled/NonPooled-4	7.139µ	7.422µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	6.585µ	6.632µ	~	0.400
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.846µ	9.929µ	~	0.700
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.089µ	10.113µ	~	0.100
_prepareTxsPerLevel-4	399.9m	417.0m	~	0.200
_prepareTxsPerLevelOrdered-4	3.495m	3.469m	~	0.400
_prepareTxsPerLevel_Comparison/Original-4	412.2m	415.0m	~	0.200
_prepareTxsPerLevel_Comparison/Optimized-4	3.482m	3.530m	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.264m	1.281m	~	0.200
SubtreeSizes/10k_tx_16_per_subtree-4	295.8µ	298.8µ	~	0.700
SubtreeSizes/10k_tx_64_per_subtree-4	70.75µ	71.79µ	~	0.400
SubtreeSizes/10k_tx_256_per_subtree-4	17.56µ	17.70µ	~	0.400
SubtreeSizes/10k_tx_512_per_subtree-4	8.675µ	8.855µ	~	0.200
SubtreeSizes/10k_tx_1024_per_subtree-4	4.324µ	4.318µ	~	0.400
SubtreeSizes/10k_tx_2k_per_subtree-4	2.176µ	2.177µ	~	0.700
BlockSizeScaling/10k_tx_64_per_subtree-4	68.65µ	69.61µ	~	0.200
BlockSizeScaling/10k_tx_256_per_subtree-4	17.27µ	17.41µ	~	0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.298µ	4.348µ	~	0.200
BlockSizeScaling/50k_tx_64_per_subtree-4	362.5µ	364.3µ	~	0.700
BlockSizeScaling/50k_tx_256_per_subtree-4	86.43µ	86.90µ	~	1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4	21.29µ	21.24µ	~	1.000
SubtreeAllocations/small_subtrees_exists_check-4	147.1µ	147.8µ	~	0.700
SubtreeAllocations/small_subtrees_data_fetch-4	156.2µ	157.7µ	~	0.400
SubtreeAllocations/small_subtrees_full_validation-4	304.8µ	307.7µ	~	1.000
SubtreeAllocations/medium_subtrees_exists_check-4	8.760µ	8.824µ	~	0.700
SubtreeAllocations/medium_subtrees_data_fetch-4	9.162µ	9.201µ	~	0.700
SubtreeAllocations/medium_subtrees_full_validation-4	17.17µ	17.33µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.073µ	2.080µ	~	0.700
SubtreeAllocations/large_subtrees_data_fetch-4	2.198µ	2.205µ	~	0.700
SubtreeAllocations/large_subtrees_full_validation-4	4.336µ	4.399µ	~	0.400
StoreBlock_Sequential/BelowCSVHeight-4	315.4µ	306.0µ	~	0.400
StoreBlock_Sequential/AboveCSVHeight-4	306.9µ	307.3µ	~	1.000
GetUtxoHashes-4	207.8n	209.0n	~	1.000
GetUtxoHashes_ManyOutputs-4	36.35µ	39.33µ	~	0.100
_NewMetaDataFromBytes-4	179.1n	178.7n	~	0.200
_Bytes-4	474.1n	473.0n	~	1.000
_MetaBytes-4	446.9n	427.4n	~	0.100

Threshold: >10% with p < 0.05 | Generated: 2026-04-19 15:23 UTC

…rror log before FSM IDLE transition

…ages

…Error_Handling test

… only

…sed metric; add nil-guards to subtreeHandler validateParentChain no longer filters — it hard-fails with FSM IDLE. The old setting and its prometheus counter were left behind as dead code. subtreeHandler.go lacked nil-guards on blockchainClient and FSM state that other services already had, risking a panic in edge cases.

…add 3 Case C tests

…x reviewer issues

… repair can run validateParentChain errors were propagating as fatal, killing all services including blockchain gRPC — making repair-conflicts unreachable. Now the error is caught in Start(), FSM stays IDLE, and the node stays up. Also switch idleAndError from SendFSMEvent(STOP) to Idle() which handles already-IDLE state gracefully instead of logging a spurious error.

…rogress - Replace brittle string matching with typed ErrRepairNeeded error (ERR_REPAIR_NEEDED=102 in proto) for validateParentChain → Start() flow - Fix "teranodecli" → "teranode-cli" typo across all services - Add progress logging to RepairConflictingChains so it's not silent for minutes during large UTXO store repairs

… repair progress Idle() sent gRPC but never updated the local fmsState cache, leaving it stale at RUNNING. GetMiningCandidate then passed the FSM guard and returned empty block templates while the node needed repair. Also switch repair progress from batch counts to record counts via TotalScanned() for meaningful output on large UTXO stores.

…pair-conflicts

Blocking STOP from CATCHINGBLOCKS traps the node in a crash loop when a data-integrity check fails during catchup: BlockAssembler's validateParentChain calls Idle() to move to IDLE for repair, the FSM rejects the event, BA.Start returns StorageError, ServiceManager stops BA, the node exits, and on restart the FSM is still persisted as CATCHINGBLOCKS — so the same thing happens again. The operator has no window to run teranode-cli repair-conflicts. Adds CATCHINGBLOCKS to STOP's Src list and widens the SendFSMEvent guard to permit STOP alongside RUN. RUN is still the normal exit when catchup completes; STOP is a safety valve for repair. Updates the state-machine diagram and fsm_test. The guard's original intent — preventing accidental transitions that would abandon catchup to RUNNING/LEGACYSYNCING — is preserved: those events remain rejected from CATCHINGBLOCKS.

After unmarking an orphan-conflicting parent P in step 4, any grandparent that (a) is Conflicting=true, (b) either names P as the recorded spender per its own SpendingData or has no SpendingData for the relevant vout at all (common when the grandparent was already conflicting when P's Spend ran, so the write was skipped), and (c) has no *active* conflicting children (counting only entries still Conflicting=true) is itself an orphan. Enqueue grandparents into the step-4 worklist so chains of stacked orphan-conflicting ancestors are resolved in a single repair run. Also broadens step-1 detection: when the scanned child's parent has SpendingData==nil for the relevant vout but is flagged conflicting with no active conflicting children, treat the parent as a Case D candidate. Without this, the child's input walk would stop at the nil SpendingData and the parent would only be reachable via chase-up started from yet another candidate. Adds hasActiveConflictingChildren helper to consistently ignore stale back-references in ConflictingChildren left over from unmarking. Observed on mainnet-eu-1: tx 8dacf3...f464 got unmarked on the first repair pass, but its grandparent 217494d8...17ef stayed conflicting with conflictingCs=[8dacf3] (stale) and spentUtxos=0 (no SpendingData ever recorded), so validateParentChain kept tripping on the next restart.

Mainnet-eu-1 run missed a parent (4557bdc6) that aerospikereader shows matches Case D criteria exactly (Conflicting=true, SD for the relevant vout is nil, ConflictingChildren=[5d12221c], and 5d12221c is currently non-conflicting). Step 1 report said only 2 Case D candidates. No report.Errors were raised for that parent, so the path that skipped it is unclear. Adds targeted logProgress lines inside step 1 that fire only when: - s.Get(parent) returned an error or nil, - vout is out of range of parent's SpendingDatas, - parent.Conflicting is true (rare — bounded noise), - hasActiveConflictingChildren bails due to a child Get error or a still-conflicting child (with the child hash). Next repair run will either show 4557bdc6 being added to the orphan list (implying the real miss is elsewhere), or reveal the exact reason detection skipped it. No behavior change; only logging.

… dedup A dirty UTXO store is never acceptable in Teranode, so any DB read or write error during repair must halt the run rather than be swallowed and reported as "non-fatal". Every `report.Errors = append(...); continue` path in RepairConflictingChains now returns the error and aborts — the only errors treated as benign are TX_NOT_FOUND responses for external references that were never stored (parent's grandparent, pruned ancestors), which are a legitimate outcome rather than a failure. The RepairReport.Errors slice is removed. Other correctness / safety changes in the same pass: - Case C dedup: the previous key (pair.loser) silently dropped distinct winners that happened to share a loser, leaving their Conflicting=true flag set. Dedup is now by winner — each distinct real-winner has ProcessConflicting called exactly once, using a fresh dedup map per call so dry-run no longer mutates shared state. - cascadeConflictingViaSpendingData: add cascadeMaxVisited cap so a corrupted or pathological SpendingData graph cannot grow the visited set without bound. SetConflicting is now issued once per frontier level instead of once per child, cutting N round trips to one per level. - hasActiveConflictingChildren returns (bool, error) and is invoked with a logReason callback so step-1 diagnostics can record why a parent wasn't enqueued as a Case D candidate. - Out-of-range vout is now a ProcessingError (genuine store corruption) rather than a silent skip. - progressFn is a single parameter, not variadic — the "optional slice that only reads [0]" shape was a footgun. - Step log tags are consistent at /4 throughout. CLI: drop the "non-fatal errors" section, nothing to iterate anymore — the single abort error is surfaced via the wrapper ProcessingError. Tests updated for the new signature and removed field; 13 repair tests still pass and cover Case A, Case C, Case D with dry-run, cascades, chained orphans and legit-conflict safety checks.

…very on FSM leave IDLE When loadUnminedTransactions returns ErrRepairNeeded the assembler used to return nil from Start() but skipped subtreeProcessor.Start and startChannelListeners entirely — leaving the gRPC server accepting calls against a half-initialised assembler. The FSM IDLE guards in upstream services are best-effort and miss some paths, so a call could still reach AddTx / GetMiningCandidate / SubmitMiningSolution and either hang on an unreferenced channel or touch an uninitialised processor. A new atomic frozenForRepair flag is set in that path and exposed via FrozenForRepair(). The gRPC methods most likely to be invoked (AddTx, AddTxBatch, AddTxBatchColumnar, RemoveTx, GetMiningCandidate, SubmitMiningSolution, GetCandidateBlock, ResetBlockAssembly) now call ba.assertNotFrozenForRepair() up front and return ErrRepairNeeded — defence-in-depth alongside the FSM IDLE checks in upstream services. Recovery is live rather than requiring a restart, to match the pause/resume semantics of blockvalidation and subtreevalidation: - Start() spawns watchForRepairCompletion as a wg-tracked goroutine. - The watcher blocks on WaitUntilFSMTransitionFromIdleState and retries loadUnminedTransactions once the operator moves the FSM out of IDLE (after running teranode-cli repair-conflicts). - On success it runs startAfterLoadUnmined (subtreeProcessor.Start, startChannelListeners, height metric) and clears the frozen flag — gated gRPC methods start accepting traffic without a node restart. - If loadUnminedTransactions returns ErrRepairNeeded again, idleAndError has already put the FSM back to IDLE; the watcher loops and waits for the next transition. - Any other error stops the watcher and keeps the assembler frozen — non-repair failures are outside this recovery path's remit. - Cleanly exits on context.Canceled during shutdown. Unrelated cleanup in the same area: subtreeprocessor reset's clear-processed-at errgroup now has SetLimit(16) so a reset spanning hundreds of moveBack blocks doesn't launch hundreds of concurrent SetBlockProcessedAt writes against the blockchain store.

…routine Every handler that checked for FSM IDLE used to log-and-fall-through on a check failure, spawn a fresh resume goroutine per invocation, and (in blockvalidation's case) wait on context.Background() so the goroutine couldn't exit on service shutdown. Under load that's hundreds of routines racing to ResumeAll, and a transient FSM-check error silently bypasses the guard entirely. Changes applied to blockvalidation/Server.go, subtreevalidation/ subtreeHandler.go, subtreevalidation/txmetaHandler.go, legacy/netsync/ handle_block.go and propagation/Server.go: - FSM-check errors now return an error rather than logging and continuing. Fail closed: if we can't confirm the FSM is not IDLE, don't admit the block / subtree / tx while the node may be in repair. - A new idleConsumerPaused atomic.Bool (on blockvalidation.Server and subtreevalidation.Server) guards the pause/resume transition. Only the first IDLE-observed call PauseAll's the consumers and spawns a single watcher; concurrent handler invocations short-circuit via CompareAndSwap. The watcher defers Store(false) on completion so the next IDLE episode re-arms cleanly. - blockvalidation's resume goroutine now uses the service context plumbed through from consumerMessageHandler instead of context.Background(), so it exits on shutdown. - blockHandler returns ErrServiceError when IDLE instead of nil so the Kafka offset is not advanced and the in-hand message is retried after the FSM leaves IDLE — matching what the log claims. - txmetaHandler operator hint updated to reference the repair CLI, matching the other handlers. pruner's triggerInitialPruning hash-lookup comment rewritten to be accurate about reorg semantics: GetBlockHeadersByHeight returns the current main-chain hash at the persisted height, which may differ from the hash that was actually persisted on an older fork; pruning is by height so this only affects the log line, not the work performed.

…se D Mainnet-eu-1 run surfaced a mutual-blocker pathology where a parent's ConflictingChildren list names a child that is itself orphan-conflicting (Conflicting=true with no credible reason — grandparent SpendingData does not show a legit loss for the child either). The old hasActiveConflictingChildren check saw Conflicting=true on the child and classified the parent as having active conflicts, so the parent was never added as a Case D candidate. The orphan child is invisible to the unmined iterator (conflicting filter), so it is never reached either. The pair stays stuck forever across repair runs. Replace hasActiveConflictingChildren with classifyConflictingChildren, which recursively checks each still-conflicting entry in the list: - stale back-reference (child.Conflicting=false now) → ignore - child.Conflicting=true AND some grandparent.SpendingData names a different spender for one of child's inputs → legit loser, parent is not a Case D candidate - child.Conflicting=true AND every reachable grandparent either names the child itself or has nil SpendingData → orphan, return alongside bool hasLegit=false Step 1 and step-4 chase-up enqueue any orphan children they find so they are unmarked in the same pass as the parent they were blocking. Legit-loser detection is unchanged. Adds TestRepairConflictingChains_CaseD_OrphanBlocksParentDetection covering the exact shape from mainnet: parentX (orphan) with conflictingCs=[blocker] where blocker is itself orphan, plus a non-conflicting goodChild spending a different output. One repair pass unmarks both and leaves goodChild untouched.

SetConflicting(false) clears the child's Conflicting flag but leaves the back-reference in every parent's ConflictingChildren list — the SQL updateParentConflictingChildren helper only ever INSERTs. If such a stale sibling is also on the best chain, the Case C scan would enqueue it as the "real winner", ProcessConflicting would reject it with "tx is not conflicting", and — now that DB errors are fatal — the whole repair would abort before Case D even starts. Filter stale entries by checking sibling.Conflicting=true alongside the best-chain check. A BlockIDs-only Get is not enough. Observed on mainnet-eu-1: step 1 correctly identified 2716 Case D orphans (after the previous orphan-blocker fix lit up detection) but step 2 aborted on tx 1e541f1… which is on the best chain but had been unmarked in an earlier repair run. Adds TestRepairConflictingChains_CaseC_StaleSiblingSkipped.

… empty A legit-losing parent whose outputs were never recorded as spent (spentUtxos=0, all SpendingDatas nil) is a real shape on mainnet: the parent was already Conflicting=true at the time its children ran Spend, and some code paths skip the SpendingData write for conflicting parents. cascadeConflictingViaSpendingData(parent) then walks an empty SD list and marks zero descendants — the real non-conflicting children (whose inputs do name the parent) stay visible to the unmined iterator and validateParentChain keeps tripping across restarts. Step 1 now records the direct children that spend each orphan-candidate parent in caseDDirectChildren. When step 4 classifies a parent as a legit loser, the cascade seeds from those tracked children in addition to whatever parent.SD turns up. The children's own SpendingDatas are properly populated, so the subsequent walk propagates correctly. Observed on mainnet-eu-1: parent 4557bdc6 is a legit loser of rootTx grandparent (rootTx.SD[0] names a10bd058, not 4557bdc6), repair correctly identified the legit-conflict path and called the cascade, but 4557bdc6.SD is all nil so child 5d12221c — which spends 4557bdc6[1] — never got its Conflicting=true mark. Adds TestRepairConflictingChains_CaseD_LegitCascadeWithNilParentSD covering the exact shape (parentLoser with nil SD, childOfLoser whose input still names the parent).

…tep 1 Mainnet run showed step 1 stalling for hours. One parent accumulated 341 entries in its ConflictingChildren list, and ~14k non-conflicting unmined txs had inputs pointing at it. Without caching each visit refetched the parent (external tx = file store hit) and re-ran classifyConflictingChildren over all 341 entries, each of which does a child Get + a Get per input → grandparent SD check. Tens of millions of Gets, many external. Step 1 does no writes, so the parent's metadata and the classification result are stable for the duration of the scan. Add two scoped caches: - parentMetaCache (hash → *meta.Data, plus parentMetaNotFound for negative caching) behind fetchParent, so each distinct parent is fetched once regardless of how many children reference it. - classificationCache (parent hash → {orphans, hasLegit}) behind classifyCached, reusing the expensive recursion over the ConflictingChildren list across every child visit of the same parent. Also drops the per-input debug log lines that flooded output when a parent's ConflictingChildren was large (hundreds of hashes per line, >100KB per visit). logProgress callback is still threaded into classifyConflictingChildren for diagnostics from within the helper. Both caches are local to step 1 — step 4 writes invalidate them, so step 4 continues to call classifyConflictingChildren directly without the cache.

Step 0 is a full-store consistency scan (hundreds of millions of records on a production node) and is almost always a no-op once it has run cleanly once. Iterating on Case A / C / D fixes currently pays that cost on every run, which has turned into hours per attempt. Add RepairOptions.SkipUnminedSinceScan and a --skip-unmined-since-scan CLI flag on teranode-cli repair-conflicts. When set, step 0 is announced as skipped and the run jumps straight to scanning unmined transactions. Best-chain header data (needed by Case A / Case C) is fetched unconditionally up front, outside the skip gate. Defaults unchanged — a fresh run still does the full scan.

Mainnet run stalled in step 4: each distinct parent triggered classifyConflictingChildren which did a Get+Tx on every entry in the parent's ConflictingChildren list, plus Gets for each grandparent for every input of every child. Two "blocker" children were shared across ~2700 parents — each parent re-classified them from scratch — and the blockers are external txs with 2001 utxos each, so every Get hit the file store. Split classifyConflictingChildren into: - classifyChild (new): per-child classification returning {exists, conflicting, legit}. Stable for the lifetime of a single repair run so long as SetConflicting(h, false) is not later called on the same h — and if it is, treating h as still-conflicting in other parents' lists just produces an acceptable stale back-ref, not an incorrect Case D decision. - classifyConflictingChildren: now takes an optional childCache map and memoizes per-child results across calls. A single childClassCache is shared by step 1 (via classifyCached) and step 4 (both the fresh-parent legit check and the chase-up grandparent check). Each distinct child is fetched + grandparent-walked once per repair run regardless of how many parents reference it. Also add appendCaseDOrphan dedup at step-1 append time — the raw slice grew to ~40k entries on mainnet while the unique orphan count was a few hundred, because the same parents and blocker children are pushed for every non-conflicting child that visits them. Step 4's seenCaseD still dedups but traversing the bloated slice was wasted time.

Step 1 only logged progress after each iterator batch and only when the aggregate scanned count had moved by 10,000. On mainnet the iterator delivers ~14k unmined txs in a small number of batches, and the first encounter with a big conflicting parent can stall a single tx for minutes while classifyChild populates the cache from external storage — so the whole scan went silent for over an hour with no output at all. Add a time-based gate (30s) and an intra-batch trigger (every 500 txs within a batch). maybeLogProgress fires whichever way the threshold is crossed first. The log line is unchanged; just called more reliably. No behavior change to the classification logic itself.

The careful Case D classification is taking hours on mainnet — each first-contact with a big conflicting parent runs hundreds of sequential external-store Gets to populate the child-class cache, and the main goroutine is blocked on futex for the duration. The most recent run unmarked 383 orphans + cascaded 4 but still left tx 4557bdc6 pointed at by 5d12221c (the original offender) stuck, meaning either the direct-children seeding or the cascade path has a subtle bug we haven't tracked down. Unmined txs are ephemeral: valid txs propagate back in minutes and the next block sweeps them up anyway. A coarse "mark every non- conflicting unmined child of a Conflicting=true+UnminedSince>0 parent Conflicting=true" pass does what BlockAssembler actually needs (descendants of a conflicting ancestor must not be in the iterator) without any classification. Valid children that happen to reference a wrongly-flagged parent get pruned at delete_at_height and re-enter via propagation. Add RepairOptions.AggressiveCascade and a CLI flag --aggressive-cascade. When set, step 1 collects candidates into aggressiveCascadeChildren and writes them all Conflicting=true in one SetConflicting batch before the Case C sweep. Step 4 is skipped entirely. Case A and Case C detection run as normal — they're cheap and strictly correct. Also add a heartbeat ticker that prints the current phase every 15s via atomic.Pointer, so a repair stuck in one deep Get still reports liveness. Replaces the per-500-tx progress check that could go silent for tens of minutes when a single tx stalled on external fetches. Default behavior unchanged.

…ting-unmined Pure rename commit, no logic change. Sets up the following commit which replaces the classification machinery with a surgical purge of records where Conflicting=true and UnminedSince>0. - stores/utxo/repair_conflicts.go → purge_conflicting_unmined.go - stores/utxo/tests/repair_conflicts_test.go → purge_conflicting_unmined_test.go - cmd/repairconflicts/ → cmd/purgeconflictingunmined/ - RepairConflictingChains → PurgeConflictingUnmined - RepairReport → PurgeReport - RepairOptions → PurgeOptions - RepairProgressFunc → PurgeProgressFunc - cmd wrapper RepairConflicts → PurgeConflictingUnmined The "repair-conflicts" CLI subcommand keeps its name here; a later commit renames it to "purge-conflicting-unmined" along with the operator-facing log strings. errors.ErrRepairNeeded / NewRepairNeededError are intentionally retained — the error semantic ("operator intervention required") is unchanged and renaming would ripple through ~10 test files for no functional gain.

… purge Replaces the Case A/C/D classification machinery with a single-pass delete of every (Conflicting=true, UnminedSince>0) record. The unmined set is ephemeral by design, so propagation and the next block are enough to restore any valid tx the purge removes; there is no need to reverse-engineer correct state from a graph whose writers never fully clean up after themselves. stores/utxo/purge_conflicting_unmined.go - Single scan over ScanInconsistentUnminedTxs combines step 0 (unmined_since fixup for mined txs still carrying the marker) and step 1 (collect conflicting-unmined hashes). - Step 2 batches Delete(ctx, hash) over the collected set. - PurgeReport fields: UnminedSinceFixed, ConflictingUnminedPurged. - PurgeOptions fields: SkipUnminedSinceScan (AggressiveCascade removed, moot). - Drops ~850 lines of classification helpers (classifyChild/classifyConflictingChildren/cascadeConflictingViaSpendingData and their caches). stores/utxo/UnminedTxIterator.go + aerospike/consistency_scan.go - InconsistentTxRecord gains a Conflicting bool so the single scan can seed both step 0 and step 1. - Aerospike scan fetches the conflicting bin and extracts it in parseConsistencyRecord. stores/utxo/sql/unmined_iterator.go - ScanInconsistentUnminedTxs is no longer a no-op on SQL; it now iterates every record with unmined_since IS NOT NULL and returns hash, block_ids, unmined_since, conflicting. Required so SQLite-backed tests exercise the purge logic through the same code path. services/blockassembly/BlockAssembler.go - validateParentChain now skips parents that are not in the UTXO store instead of parking FSM in IDLE. This is the load-bearing change that makes the surgical purge viable: non-conflicting children whose parents get deleted remain harmlessly in the iterator and get mined or pruned. cmd/purgeconflictingunmined/purge_conflicting_unmined.go + cmd/teranodecli/teranodecli/cli.go - Drop --aggressive-cascade flag and rewrite the report output to the two remaining counters. The "repair-conflicts" CLI subcommand name is kept here; a later commit renames it to "purge-conflicting-unmined" alongside the operator-facing log strings. Tests: the repair-era Case A/C/D tests are replaced with a single clean-state smoke test in this commit; the full purge test suite lands in the next commit.

Replaces the smoke test from the previous commit with comprehensive coverage of PurgeConflictingUnmined behavior against a real SQLite-backed utxo.Store: - CleanState: empty store yields a zeroed report. - DeletesConflictingUnmined: (Conflicting=true, UnminedSince>0) record is removed from the store. - LeavesNonConflictingUnminedAlone: purging a parent does not touch its non-conflicting unmined child; BA's validateParentChain change handles the dangling ref. - LeavesMinedTxAlone: records with Conflicting=true but UnminedSince=0 are protected by the UnminedSince>0 filter. - DryRun: candidates are counted but not deleted. - SkipUnminedSinceScan: step 0 is skipped while step 1 still runs. - UnminedSinceFix: step 0 re-marks mined-on-best-chain records still carrying UnminedSince. - Idempotent: second run finds nothing to delete. - DeleteForwardsThroughStore: a store wrapper sees Delete called once per purged hash — this is the hook a live node's TxMetaCache uses to evict stale entries. setupSQLiteFileStore now pre-creates the shared parent Tx as mined so SetConflicting on child transactions can resolve the conflicting_children FK.

- Rename the CLI subcommand from repair-conflicts to purge-conflicting-unmined and update its help text to describe the new behavior (delete, not repair). - Update all 11 operator-facing log/error strings across blockassembly, blockvalidation, subtreevalidation, propagation, and legacy/netsync so the suggested fix points at the new command. - Update docs/references/settings/services/blockassembly_settings.md. - Add docs/howto/recovery-from-idle.md with a short runbook entry describing the IDLE → purge → FSM RUNNING recovery flow and why non-conflicting children are intentionally left alone. - Update the validateParentChain test comment; the test itself asserts on errors.ErrRepairNeeded (unchanged) so no behavior change is needed. errors.ErrRepairNeeded and NewRepairNeededError keep their names — the error semantic ("operator intervention required") is unchanged, only the fix command name is renamed, and ripple-renaming would touch ~10 test files for no functional gain.

…pair-conflicts

The command is about to grow beyond deleting conflicting-unmined records — it will also remove non-conflicting unmined transactions whose parents are mined on an orphaned fork (surfaced on mainnet-eu-1 after the first run of purge-conflicting-unmined unfroze BA but left stale orphan-mined parent references tripping validateParentChain). Rename now to keep the next commit's diff focused on the new logic. - stores/utxo/purge_conflicting_unmined.go → cleanup_unmined.go - stores/utxo/tests/purge_conflicting_unmined_test.go → cleanup_unmined_test.go - cmd/purgeconflictingunmined/ → cmd/cleanupunmined/ - PurgeConflictingUnmined → CleanupUnmined - PurgeReport → CleanupReport - PurgeOptions → CleanupOptions - PurgeProgressFunc → CleanupProgressFunc - CLI subcommand: purge-conflicting-unmined → cleanup-unmined - All 11 operator log/error strings updated. - docs/howto/recovery-from-idle.md + blockassembly_settings.md updated. No logic change in this commit.

The first mainnet-eu-1 run of the prior purge-conflicting-unmined command unfroze Block Assembly long enough to reveal a second inconsistency class the tool had not touched: non-conflicting unmined transactions whose parent is mined on a block that is no longer on the best chain (an orphaned fork). Example: parent 1aebda16... was mined in block id 945137 at height 945052 but that block is off the current best chain, so BA's validateParentChain trips with "parent is on wrong chain" on every load of the unmined set. Step 3 now iterates GetUnminedTxIterator (non-conflicting unmined) in batches, BatchDecorate-fetches parent BlockIDs + UnminedSince + Conflicting, and deletes children whose parent is: - Conflicting=true (the child is dangling) - UnminedSince=0 with empty BlockIDs (inconsistency) - UnminedSince=0 with BlockIDs all off the best chain (orphan-mined) Missing parents are tolerated (BA's validateParentChain skips those since the purge rewrite). Parents with UnminedSince>0 remain visible to BA's iterator and are therefore valid — no child delete. CleanupReport: - Renames ConflictingUnminedPurged → ConflictingUnminedDeleted. - Adds OrphanParentUnminedDeleted counter. Tests (3 new, alongside updated helpers): - DeletesOrphanParentChildren: parent mined off-best-chain, non-conflicting unmined child is deleted. - LeavesMainChainParentChildrenAlone: parent on best chain, child untouched. - OrphanParentDryRun: dry run counts but does not delete. Existing tests updated: newQuerier() now publishes the shared parent Tx's block id as on the best chain so step 3 does not accidentally flag newTestTx-derived children in scenarios that are not exercising orphan-mined behavior. Runbook (docs/howto/recovery-from-idle.md) updated to describe step 3 and to note that unmined subtree blobs are left to the pruner/TTL (content- addressed, unique by hash, stale blob costs only disk).

Clarify in the IDLE-recovery runbook that cleanup-unmined's step 3 deletions are safe even if a peer has the deleted tx in a blessed subtree. If a later block arrives referencing such a subtree, block validation does not hard- fail: SubtreeValidation.processMissingTransactions refetches the tx bytes from the peer and reconstructs the UTXO metadata as part of normal validation. BatchDecorate TX_NOT_FOUND is treated as a miss counter, not a fatal error. Cite the relevant source locations so future readers can verify the safety argument without having to re-trace the path.

… anchor Mainnet-eu-1 run found 0 orphan-parent children even though Block Assembly then tripped validateParentChain with "parent is on wrong chain (blocks: [945137])". Root cause: cleanup classified against the blockchain service's best chain (via GetBestBlockHeader), but BA uses its own persisted CurrentBlock from the blockchain DB state table on startup. After a reorg the two views diverge; cleanup saw 945137 as on main while BA did not. Adapter's GetBestBlockHeaderInfo now calls blockassembly.Client. GetBlockAssemblyState and uses CurrentHash / CurrentHeight as the walk anchor for GetBlockHeaderIDs. BA is now a required dependency of cleanup — its gRPC stays reachable in IDLE (only write entry points are gated by frozenForRepair). If BA is unreachable or returns an invalid hash we fail loudly rather than silently cleaning against a drifted chain view.

sonarqubecloud · 2026-04-19T15:19:56Z

Quality Gate failed

Failed conditions
61.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

oskarszoon · 2026-04-20T08:32:04Z

Even after all the proper cleanup, we still had incorrect conflicting marks in subtrees from already received blocks. Because of the previous bugs, there isn't an easy recovery path for Teranodes in this state and resetting/reseeding is preferred

icellan added 8 commits April 15, 2026 13:21

refactor(utxo): export MarkConflictingRecursively for external use

cfd8df6

fix(blockassembly): cascade conflicting mark to all descendants on de…

45ddfd9

…tection

feat(utxo): add RepairConflictingChains for offline conflict repair

10dbc76

test(utxo): use t.TempDir for SQLite file store isolation in repair t…

d10a414

…ests

feat(cmd): add repair-conflicts teranodecli subcommand

536a85a

fix(services): add FSM IDLE guards to blockvalidation, subtreevalidat…

7447a06

…ion, propagation, legacy

icellan self-assigned this Apr 15, 2026

icellan added 2 commits April 15, 2026 15:31

fix(blockvalidation): nil-guard blockchainClient in IDLE check; add e…

d8022cc

…rror log before FSM IDLE transition

fix(services): pause Kafka consumers on IDLE instead of dropping mess…

56bf6e4

…ages

icellan requested review from ordishs and oskarszoon April 15, 2026 14:09

icellan added 2 commits April 15, 2026 16:44

test(blockvalidation): stub IsFSMCurrentState on mock in Recoverable_…

266cdba

…Error_Handling test

refactor(blockassembly): simplify validateParentChain to return error…

4c5f240

… only

ordishs approved these changes Apr 16, 2026

View reviewed changes

icellan mentioned this pull request Apr 16, 2026

fix: remove flawed OnRestartRemoveInvalidParentChainTxs setting #702

Closed

3 tasks

oskarszoon force-pushed the fix/repair-conflicts branch from f6d9449 to 2fbd137 Compare April 16, 2026 08:11

This was referenced Apr 16, 2026

fix: reconcile SI-scan-missed parents in validateParentChain #701

Closed

fix: wait for BlockValidation to process invalid moveBack blocks before loading unmined txs #691

Merged

icellan and others added 7 commits April 16, 2026 11:03

fix(utxo): fix Case C detection to use PARENT's ConflictingChildren; …

f81a2d2

…add 3 Case C tests

fix(services): resume Kafka consumers on FSM transition from IDLE; fi…

584c850

…x reviewer issues

fix(settings): apply gci formatting to blockassembly_settings.go

ee9d973

Merge branch 'main' of github.com:bsv-blockchain/teranode into fix/re…

c161446

…pair-conflicts

oskarszoon and others added 18 commits April 17, 2026 08:50

oskarszoon changed the title ~~fix(blockassembly): repair conflicting tx chains + FSM IDLE enforcement~~ fix(blockassembly): purge-conflicting-unmined + FSM IDLE enforcement Apr 19, 2026

oskarszoon added 4 commits April 19, 2026 14:33

style(cli): gci re-align commandHelp map to longest key

a4cae35

Merge branch 'main' of github.com:bsv-blockchain/teranode into fix/re…

58671de

…pair-conflicts

github-actions Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread docs/references/settings/services/blockassembly_settings.md Outdated

oskarszoon added 3 commits April 19, 2026 16:20

fix docs

037c4af

oskarszoon closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(blockassembly): purge-conflicting-unmined + FSM IDLE enforcement#704

fix(blockassembly): purge-conflicting-unmined + FSM IDLE enforcement#704
icellan wants to merge 52 commits into
mainfrom
fix/repair-conflicts

icellan commented Apr 15, 2026 •

edited by oskarszoon

Loading

Uh oh!

github-actions Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 19, 2026

Uh oh!

oskarszoon commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

icellan commented Apr 15, 2026 • edited by oskarszoon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New: teranode-cli purge-conflicting-unmined

FSM IDLE enforcement (from earlier work on this branch)

Operator flow

Test plan

Commits in the purge pivot

Not addressed / out of scope

Uh oh!

github-actions Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Findings

Strengths

Documentation Accuracy Verified

Architecture Notes

Review Complete

Uh oh!

github-actions Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 19, 2026

Quality Gate failed

Uh oh!

oskarszoon commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

icellan commented Apr 15, 2026 •

edited by oskarszoon

Loading

New: `teranode-cli purge-conflicting-unmined`

github-actions Bot commented Apr 15, 2026 •

edited

Loading

github-actions Bot commented Apr 15, 2026 •

edited

Loading