Skip to content

test(legacy/netsync): fix _NilParentTx flake from SyncedMap random eviction#931

Closed
freemans13 wants to merge 3 commits into
bsv-blockchain:mainfrom
freemans13:stu/netsync-fixture-flake
Closed

test(legacy/netsync): fix _NilParentTx flake from SyncedMap random eviction#931
freemans13 wants to merge 3 commits into
bsv-blockchain:mainfrom
freemans13:stu/netsync-fixture-flake

Conversation

@freemans13

Copy link
Copy Markdown
Collaborator

Summary

TestSyncManager_extendFromTxMap_NilParentTx (and its ExtendTransaction_NilParentTx sibling) intermittently fail in CI with:

expected TxInvalid error, got PROCESSING (4): tx <hash> not found in txMap

Root cause

The shared fixtures buildOOBFixture and buildInRangeFixture (services/legacy/netsync/handle_block_test.go) construct a txmap.NewSyncedMap[...](2) and immediately insert 2 entries, leaving len == limit. The _NilParentTx tests then do a third Set on the already-present parent key. go-tx-map's setUnlocked runs its eviction branch whenever len(m) >= limit, including on key-update, and evicts a random item via Go's randomized map iteration:

// vendor: bsv-blockchain/go-tx-map synced.go
func (m *SyncedMap[K, V]) setUnlocked(key K, value V) {
    if m.limit > 0 && len(m.m) >= m.limit {
        for k := range m.m { delete(m.m, k); break } // random eviction
    }
    m.m[key] = value
}

~50% of runs the child is evicted, so when the SUT calls extendFromTxMap(ctx, child, txMap) its very first lookup (txMap.Get(*tx.TxIDChainHash())) misses and returns a ProcessingError instead of reaching the nil-parent branch under test.

Fix

Drop the (2) limit on both fixtures — the production path sizes the map by block tx count, not 2 — and add a short comment explaining the eviction trap so the next person who copies the fixture doesn't fall into the same hole.

This is purely a test-fixture change; no production code is touched.

Repro / verification

Before, locally on upstream/main:

go test -count=10 -race -tags testtxmetacache \
  -run "TestSyncManager_(extendFromTxMap|ExtendTransaction)_NilParentTx" \
  ./services/legacy/netsync/
# FAIL: ~50% of iterations

After this patch:

go test -count=20 -race -tags testtxmetacache \
  -run "TestSyncManager_(extendFromTxMap|ExtendTransaction)" \
  ./services/legacy/netsync/
# 120 passed

Note

Strictly speaking the eviction logic in go-tx-map is also wrong (setUnlocked should not evict when the key already exists). Worth chasing upstream as a follow-up, but the fixture is the test-side defect that's blocking CI today.

Surfaced while investigating CI red on #863, which doesn't touch this package — see #863 for context.

buildOOBFixture and buildInRangeFixture created their SyncedMap with
limit=2 and then inserted exactly 2 entries. The _NilParentTx tests
subsequently Set on an existing key; once len == limit, go-tx-map's
setUnlocked evicts a random item from the map before writing — which
~50% of the time drops the child the SUT is about to look up, leaving
the test failing with "tx <hash> not found in txMap". Reproduced
locally with -count=10 under -race.

Use an unbounded SyncedMap in the fixtures (the production code path
sizes by block tx count, not 2) and leave a comment explaining the
eviction trap so future fixtures don't fall into the same hole.
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


The fix correctly addresses the test flake by removing the artificial size limit from both test fixtures. The root cause analysis is sound: SyncedMap with limit=2 evicts randomly when updating an existing key once len >= limit, causing ~50% failure rate.

Minor Issue

The comment at line 648 in buildOOBFixture states "callers (e.g. the _NilParentTx test)" but this fixture is only used by _OOB tests, not _NilParentTx tests. Consider changing to "_OOB tests" for accuracy. The buildInRangeFixture comment at line 729 correctly references "NilParentTx tests" and is accurate.

Assessment

The implementation is correct and test-only. No production code impact. The comments explain the eviction trap to prevent future regressions.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-931 (d3829e5)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.781µ 1.791µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.67n 61.47n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.64n 61.62n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.54n 61.50n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 30.24n 31.61n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 54.20n 53.36n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 112.1n 116.1n ~ 1.000
MiningCandidate_Stringify_Short-4 260.1n 261.3n ~ 0.400
MiningCandidate_Stringify_Long-4 1.968µ 1.933µ ~ 0.100
MiningSolution_Stringify-4 986.5n 974.7n ~ 0.200
BlockInfo_MarshalJSON-4 1.830µ 1.804µ ~ 0.100
NewFromBytes-4 129.4n 145.4n ~ 0.100
AddTxBatchColumnar_Validation-4 2.653µ 2.560µ ~ 0.400
OffsetValidationLoop-4 545.9n 546.9n ~ 0.700
Mine_EasyDifficulty-4 60.20µ 61.14µ ~ 0.700
Mine_WithAddress-4 7.001µ 6.751µ ~ 0.100
BlockAssembler_AddTx-4 0.02918n 0.03470n ~ 0.400
AddNode-4 10.85 10.68 ~ 0.700
AddNodeWithMap-4 10.99 11.79 ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 61.70n 58.29n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 30.17n 29.91n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 29.13n 29.02n ~ 0.700
DirectSubtreeAdd/1024_per_subtree-4 28.00n 27.84n ~ 0.200
DirectSubtreeAdd/2048_per_subtree-4 27.69n 27.54n ~ 0.200
SubtreeProcessorAdd/4_per_subtree-4 289.5n 282.2n ~ 0.100
SubtreeProcessorAdd/64_per_subtree-4 287.0n 276.8n ~ 0.100
SubtreeProcessorAdd/256_per_subtree-4 290.5n 278.9n ~ 0.100
SubtreeProcessorAdd/1024_per_subtree-4 285.5n 270.4n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 282.8n 272.1n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 285.1n 276.4n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 278.8n 272.1n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 278.8n 275.7n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 279.4n 272.1n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.45n 53.72n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 34.28n 34.15n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.45n 33.30n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.58n 32.52n ~ 1.000
SubtreeCreationOnly/4_per_subtree-4 114.4n 112.8n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 407.5n 397.8n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.367µ 1.350µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.469µ 4.407µ ~ 0.200
SubtreeCreationOnly/2048_per_subtree-4 8.287µ 8.124µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 278.1n 272.7n ~ 0.200
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 278.3n 272.5n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.261m 2.218m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.512m 5.373m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.731m 7.179m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 10.59m 10.23m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 2.005m 1.962m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 4.548m 4.592m ~ 1.000
SequentialGetAndSetIfNotExists/50k_nodes-4 13.11m 12.15m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 23.78m 22.22m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.286m 2.238m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.447m 8.151m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.57m 12.88m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.992m 1.981m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.946m 7.421m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.87m 42.88m ~ 0.100
DiskTxMap_SetIfNotExists-4 3.885µ 3.778µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.477µ 3.563µ ~ 1.000
DiskTxMap_ExistenceOnly-4 345.7n 336.3n ~ 0.700
Queue-4 190.6n 188.6n ~ 0.100
AtomicPointer-4 3.655n 3.618n ~ 0.200
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 868.2µ 875.2µ ~ 1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4 812.5µ 799.1µ ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/10K-4 118.0µ 115.9µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 64.48µ 64.93µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/10K-4 61.44µ 54.67µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 10.96µ 11.11µ ~ 0.200
ReorgOptimizations/NodeFlags/Old/10K-4 4.541µ 4.561µ ~ 0.800
ReorgOptimizations/NodeFlags/New/10K-4 2.004µ 1.579µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.608m 10.476m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.912m 10.175m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.103m 1.111m ~ 0.700
ReorgOptimizations/AllMarkFalse/New/100K-4 707.4µ 705.0µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 582.3µ 525.6µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/100K-4 206.9µ 198.3µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/100K-4 46.63µ 48.62µ ~ 0.200
ReorgOptimizations/NodeFlags/New/100K-4 16.15µ 16.50µ ~ 1.000
TxMapSetIfNotExists-4 49.88n 49.62n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 41.30n 41.23n ~ 0.800
ChannelSendReceive-4 576.2n 579.7n ~ 1.000
CalcBlockWork-4 506.0n 508.8n ~ 0.100
CalculateWork-4 688.6n 695.8n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.684µ 1.688µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_100-4 12.98µ 12.98µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 128.5µ 127.7µ ~ 0.100
CatchupWithHeaderCache-4 104.6m 104.5m ~ 0.400
SubtreeSizes/10k_tx_4_per_subtree-4 1.370m 1.366m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 325.7µ 321.6µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 77.96µ 77.63µ ~ 1.000
SubtreeSizes/10k_tx_256_per_subtree-4 19.62µ 19.41µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.595µ 9.644µ ~ 0.700
SubtreeSizes/10k_tx_1024_per_subtree-4 4.834µ 4.800µ ~ 0.400
SubtreeSizes/10k_tx_2k_per_subtree-4 2.387µ 2.389µ ~ 0.700
BlockSizeScaling/10k_tx_64_per_subtree-4 76.15µ 76.51µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 19.35µ 19.20µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.795µ 4.762µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 398.0µ 401.4µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 96.88µ 95.41µ ~ 0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.82µ 23.70µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 160.2µ 164.0µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 166.0µ 167.9µ ~ 0.700
SubtreeAllocations/small_subtrees_full_validation-4 330.9µ 332.9µ ~ 1.000
SubtreeAllocations/medium_subtrees_exists_check-4 9.531µ 9.581µ ~ 0.400
SubtreeAllocations/medium_subtrees_data_fetch-4 9.886µ 9.890µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 19.35µ 19.40µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.270µ 2.282µ ~ 0.700
SubtreeAllocations/large_subtrees_data_fetch-4 2.416µ 2.401µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.779µ 4.839µ ~ 0.400
_BufferPoolAllocation/16KB-4 4.586µ 4.037µ ~ 0.700
_BufferPoolAllocation/32KB-4 10.354µ 8.496µ ~ 0.200
_BufferPoolAllocation/64KB-4 19.11µ 18.73µ ~ 1.000
_BufferPoolAllocation/128KB-4 30.38µ 35.36µ ~ 0.100
_BufferPoolAllocation/512KB-4 113.2µ 121.7µ ~ 0.700
_BufferPoolConcurrent/32KB-4 20.07µ 19.31µ ~ 0.400
_BufferPoolConcurrent/64KB-4 32.90µ 29.39µ ~ 0.100
_BufferPoolConcurrent/512KB-4 158.0µ 147.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 625.5µ 670.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 689.1µ 681.9µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/64KB-4 685.0µ 684.4µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/128KB-4 663.2µ 683.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 600.9µ 614.5µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.45m 38.28m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.18m 37.79m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.89m 37.61m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.62m 37.47m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.61m 37.26m ~ 0.100
_PooledVsNonPooled/Pooled-4 833.2n 835.6n ~ 1.000
_PooledVsNonPooled/NonPooled-4 7.730µ 7.868µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 6.561µ 7.693µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.725µ 10.006µ ~ 0.200
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.191µ 10.376µ ~ 0.100
_prepareTxsPerLevel-4 399.0m 405.0m ~ 1.000
_prepareTxsPerLevelOrdered-4 3.559m 4.017m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 398.4m 400.9m ~ 0.700
_prepareTxsPerLevel_Comparison/Optimized-4 3.649m 3.836m ~ 0.700
StoreBlock_Sequential/BelowCSVHeight-4 326.2µ 315.7µ ~ 0.100
StoreBlock_Sequential/AboveCSVHeight-4 319.1µ 321.5µ ~ 0.400
GetUtxoHashes-4 270.3n 274.1n ~ 0.200
GetUtxoHashes_ManyOutputs-4 46.78µ 45.53µ ~ 0.100
_NewMetaDataFromBytes-4 214.3n 214.2n ~ 0.800
_Bytes-4 402.5n 400.3n ~ 0.200
_MetaBytes-4 138.6n 144.5n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-06-02 09:05 UTC

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Real bug — SyncedMap.setUnlocked evicts a random entry when full, without first checking if the key already exists. Re-Setting an existing key on a full map can drop a different key. Fixture had NewSyncedMap(2) and re-Set the parent → ~50% the child was evicted → "tx not found in txMap" instead of expected TxInvalid.

Pre-fix reproduction confirmed; post-fix 20× clean.

Worth a follow-up upstream issue on go-tx-map — the bug is dormant in Teranode prod (rejectedTxns has a prior .Get-exists guard; txMap iterates unique hashes) but worth fixing at the source.

Tiny alternative the author could consider: sibling tests _NilOutput and _NilLockingScript mutate the wrapper directly (parentWrapper.Tx.Outputs[0] = nil). _NilParentTx could do parentWrapper.Tx = nil instead of Set(...), keeping the (2) limit and matching sibling style. Current choice is fine — the comment is a useful learning artifact.

icellan added a commit that referenced this pull request May 22, 2026
buildOOBFixture and buildInRangeFixture created their SyncedMap with
limit=2 and then inserted exactly 2 entries. The _NilParentTx tests
subsequently Set on an existing key; once len == limit, go-tx-map's
setUnlocked evicts a random item from the map before writing — which
~50% of the time drops the child the SUT is about to look up, leaving
the test failing with "tx <hash> not found in txMap". Reproduced
locally with -count=20 under -race.

Use an unbounded SyncedMap in the fixtures (the production path sizes
by block tx count, not 2) and leave a comment explaining the eviction
trap so future fixtures don't fall into the same hole.

Duplicates the fix in PR #931 so this PR's CI is unblocked
independently; will fold into theirs if #931 lands first.
icellan added a commit that referenced this pull request May 22, 2026
buildOOBFixture and buildInRangeFixture created their SyncedMap with
limit=2 and then inserted exactly 2 entries. The _NilParentTx tests
subsequently Set on an existing key; once len == limit, go-tx-map's
setUnlocked evicts a random item from the map before writing — which
~50% of the time drops the child the SUT is about to look up, leaving
the test failing with "tx <hash> not found in txMap". Reproduced
locally with -count=20 under -race.

Use an unbounded SyncedMap in the fixtures (the production path sizes
by block tx count, not 2) and leave a comment explaining the eviction
trap so future fixtures don't fall into the same hole.

Duplicates the fix in PR #931 so this PR's CI is unblocked
independently; will fold into theirs if #931 lands first.
icellan added a commit that referenced this pull request May 22, 2026
buildOOBFixture and buildInRangeFixture created their SyncedMap with
limit=2 and then inserted exactly 2 entries. The _NilParentTx tests
subsequently Set on an existing key; once len == limit, go-tx-map's
setUnlocked evicts a random item from the map before writing — which
~50% of the time drops the child the SUT is about to look up, leaving
the test failing with "tx <hash> not found in txMap". Reproduced
locally with -count=20 under -race.

Use an unbounded SyncedMap in the fixtures (the production path sizes
by block tx count, not 2) and leave a comment explaining the eviction
trap so future fixtures don't fall into the same hole.

Duplicates the fix in PR #931 so this PR's CI is unblocked
independently; will fold into theirs if #931 lands first.

(Amended to force a fresh CI trigger — the prior push at 13:09 didn't
fire a pull_request event.)
freemans13 added a commit to freemans13/teranode that referenced this pull request May 26, 2026
@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

@freemans13 freemans13 requested a review from ordishs June 4, 2026 15:29
@freemans13

Copy link
Copy Markdown
Collaborator Author

Closing — this change is already fully present on main, covered by two separate merged PRs:

Together #767 + #933 make every change in this PR a no-op against current main — both fixtures already construct their SyncedMap without the (2) limit, so the random-eviction-on-key-update trap is gone.

Worth noting the underlying root cause still stands as a potential upstream follow-up: go-tx-map's setUnlocked (v1.3.7) evicts on key update, not just on growth (if m.limit > 0 && len(m.m) >= m.limit), which affects any bounded-SyncedMap consumer. That's a separate go-tx-map issue, not a teranode one.

@freemans13 freemans13 closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants