Skip to content

ci: bump test job to 32-core/128GB runner to stop intermittent OOM (#1051)#1052

Merged
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:ci/test-runner-32core
Jun 8, 2026
Merged

ci: bump test job to 32-core/128GB runner to stop intermittent OOM (#1051)#1052
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:ci/test-runner-32core

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

Temporary mitigation for the intermittent Error: Process completed with exit code 143 in the test check. Full root-cause analysis and the memory-reduction plan: #1051.

What

  • Bump the test job runner: teranode-runner-16-core-arm (64GB) → teranode-runner-32-core-arm (128GB).
  • Add an end-of-run top-10 peak-memory dump to the test step.

Why

make test runs go test -race -coverpkg=./... across the whole repo. A few test binaries peak very high (measured via VmHWM):

sql.test             34.2G   (stores/blockchain/sql + stores/utxo/sql)
blockassembly.test   31.4G   (services/blockassembly)
subtreeprocessor.test 19.6G  (services/blockassembly/subtreeprocessor)
txmetacache.test     14.8G   (stores/txmetacache)
model.test           10.8G   (model)
aerospike.test        4.9G   (stores/utxo/aerospike)

At -p=16 enough of these overlap to push the job-wide peak to ~110GB+, intermittently OOMing the 64GB runner → SIGTERM → exit code 143. 128GB gives headroom.

This is a stopgap, not a fix. The real work — reducing per-package memory — is tracked in #1051. Revert the runner to 16-core once that lands.

Monitoring

The test step now prints, at the end of every run (collapsed Peak memory log group):

PEAKMEM 34.2G sql.test
PEAKMEM 31.4G blockassembly.test
...

per-binary peak via /proc/<pid>/status VmHWM (kernel monotonic high-water — no sampling gaps), plus the cgroup job-wide peak. End-only, top 10, no per-interval spam — so we can watch the numbers drop as #1051 progresses.

Scope

…sv-blockchain#1051)

`make test` (go test -race -coverpkg=./... over the whole repo) hits exit
143 intermittently: a few test binaries peak very high (sql ~34G,
blockassembly ~31G, subtreeprocessor ~20G) and at -p=16 enough overlap to
exhaust the 16-core/64GB runner (job peak ~110GB+). Temporary bump to
32-core/128GB to stop the bleeding while per-package memory is reduced.

Also adds an end-of-run top-10 peak-memory dump (per-binary VmHWM + cgroup
job peak) to monitor improvements. Revert the runner once bsv-blockchain#1051 lands.
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

Found 1 minor issue — see inline comment:

  • Potential file write collision in background memory tracker (low probability, low impact)

Overall Assessment:

This is a reasonable temporary mitigation for the OOM issue described in #1051. The approach is sound:

  • Runner bump (16→32 core, 64→128GB) addresses immediate intermittent failures
  • Memory tracking script provides telemetry to measure future improvements
  • PR clearly documents this as temporary with a revert plan

Shell Script Review:

The memory tracking implementation is generally solid:

  • Uses /proc filesystem correctly for VmHWM extraction
  • Handles missing files gracefully with 2>/dev/null
  • EXIT trap ensures summary runs even on SIGTERM
  • Proper use of set +e to prevent SIGPIPE failures

Minor note: The background track() function appends to $TRACK without atomic writes or locking, creating a theoretical race condition if multiple iterations sample the same PID simultaneously. Given the 10-second interval and fast append operations, collision probability is very low and impact minimal (duplicate/corrupted lines in the summary). Acceptable for diagnostic tooling.

Comment thread .github/workflows/teranode_pr_tests.yaml
@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1052 (119772c)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 132
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.752µ 1.762µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.61n 61.69n ~ 0.300
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 62.12n 61.81n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.63n 61.57n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 29.76n 30.34n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 50.03n 50.76n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 108.2n 108.3n ~ 0.800
MiningCandidate_Stringify_Short-4 266.6n 264.7n ~ 0.400
MiningCandidate_Stringify_Long-4 1.923µ 1.899µ ~ 0.100
MiningSolution_Stringify-4 998.1n 979.9n ~ 0.100
BlockInfo_MarshalJSON-4 1.769µ 1.756µ ~ 0.200
NewFromBytes-4 134.8n 134.3n ~ 0.700
AddTxBatchColumnar_Validation-4 2.540µ 2.539µ ~ 0.700
OffsetValidationLoop-4 641.1n 641.4n ~ 0.700
Mine_EasyDifficulty-4 60.92µ 60.60µ ~ 0.200
Mine_WithAddress-4 6.807µ 6.900µ ~ 0.100
BlockAssembler_AddTx-4 0.02191n 0.02347n ~ 0.700
AddNode-4 9.367 9.536 ~ 0.700
AddNodeWithMap-4 10.593 9.866 ~ 0.100
DiskTxMap_SetIfNotExists-4 3.553µ 3.791µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.499µ 3.350µ ~ 0.200
DiskTxMap_ExistenceOnly-4 318.9n 317.4n ~ 1.000
Queue-4 192.7n 191.1n ~ 0.700
AtomicPointer-4 4.351n 4.560n ~ 0.700
TxMapSetIfNotExists-4 52.47n 52.44n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 40.39n 40.28n ~ 0.100
ChannelSendReceive-4 587.9n 588.3n ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 57.83n 59.69n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 29.26n 29.42n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 28.11n 28.66n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 26.71n 26.73n ~ 0.500
DirectSubtreeAdd/2048_per_subtree-4 26.28n 26.35n ~ 0.300
SubtreeProcessorAdd/4_per_subtree-4 314.8n 305.6n ~ 0.100
SubtreeProcessorAdd/64_per_subtree-4 295.1n 302.1n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 293.7n 292.9n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 294.2n 294.4n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 295.8n 294.4n ~ 0.400
SubtreeProcessorRotate/4_per_subtree-4 296.1n 295.5n ~ 0.400
SubtreeProcessorRotate/64_per_subtree-4 295.1n 291.8n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 295.5n 294.6n ~ 0.200
SubtreeProcessorRotate/1024_per_subtree-4 296.1n 293.3n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 55.25n 55.29n ~ 1.000
SubtreeNodeAddOnly/64_per_subtree-4 36.36n 36.16n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 35.31n 35.35n ~ 1.000
SubtreeNodeAddOnly/1024_per_subtree-4 34.75n 34.59n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 96.60n 109.80n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 426.6n 464.0n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.406µ 1.306µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.488µ 4.596µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 7.624µ 8.596µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 295.8n 294.4n ~ 0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 302.8n 293.5n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 11.13m 12.46m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 14.20m 14.48m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 17.09m 17.29m ~ 1.000
ParallelGetAndSetIfNotExists/100k_nodes-4 19.32m 20.83m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 10.58m 12.11m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 14.83m 16.23m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 20.27m 20.56m ~ 0.700
SequentialGetAndSetIfNotExists/100k_nodes-4 28.12m 27.66m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 11.58m 11.16m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 14.86m 14.53m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 20.38m 19.06m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 13.48m 13.10m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 16.62m 16.73m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 58.05m 51.75m ~ 0.100
CalcBlockWork-4 512.8n 510.1n ~ 0.100
CalculateWork-4 711.6n 712.2n ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/1000-4 49.36µ 42.14µ ~ 0.200
CheckOldBlockIDs/off-chain-prefetch/1000-4 35.54µ 36.28µ ~ 0.100
CheckOldBlockIDs/on-chain-prefetch/10000-4 320.9µ 328.9µ ~ 0.100
CheckOldBlockIDs/off-chain-prefetch/10000-4 258.3µ 266.4µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_10-4 1.096µ 1.175µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 10.58µ 11.54µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 104.1µ 112.8µ ~ 0.100
CatchupWithHeaderCache-4 104.0m 103.9m ~ 0.200
SubtreeSizes/10k_tx_4_per_subtree-4 1.352m 1.328m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 316.3µ 317.6µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 74.55µ 75.54µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 18.62µ 18.55µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 9.135µ 9.277µ ~ 0.200
SubtreeSizes/10k_tx_1024_per_subtree-4 4.585µ 4.626µ ~ 0.400
SubtreeSizes/10k_tx_2k_per_subtree-4 2.293µ 2.280µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 72.84µ 74.56µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 18.67µ 18.44µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.630µ 4.630µ ~ 1.000
BlockSizeScaling/50k_tx_64_per_subtree-4 386.6µ 383.7µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 92.17µ 92.47µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.43µ 22.98µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 156.4µ 157.3µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 164.5µ 160.0µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 323.1µ 321.9µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.291µ 9.265µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 9.610µ 9.507µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 18.62µ 18.70µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.189µ 2.198µ ~ 0.700
SubtreeAllocations/large_subtrees_data_fetch-4 2.345µ 2.310µ ~ 0.400
SubtreeAllocations/large_subtrees_full_validation-4 4.624µ 4.636µ ~ 1.000
_BufferPoolAllocation/16KB-4 3.952µ 3.959µ ~ 0.700
_BufferPoolAllocation/32KB-4 9.766µ 11.760µ ~ 0.200
_BufferPoolAllocation/64KB-4 18.84µ 19.68µ ~ 1.000
_BufferPoolAllocation/128KB-4 39.08µ 32.96µ ~ 0.100
_BufferPoolAllocation/512KB-4 131.2µ 130.9µ ~ 1.000
_BufferPoolConcurrent/32KB-4 20.23µ 19.83µ ~ 0.700
_BufferPoolConcurrent/64KB-4 31.79µ 31.19µ ~ 0.700
_BufferPoolConcurrent/512KB-4 157.3µ 153.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 693.7µ 651.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 689.3µ 645.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 687.3µ 656.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 679.4µ 656.8µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/512KB-4 629.7µ 619.0µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.44m 37.37m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.20m 37.70m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.08m 37.20m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.21m 37.02m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.97m 37.05m ~ 1.000
_PooledVsNonPooled/Pooled-4 834.4n 832.6n ~ 0.500
_PooledVsNonPooled/NonPooled-4 8.632µ 8.308µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 8.180µ 7.087µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.226µ 9.591µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.518µ 9.313µ ~ 0.700
_prepareTxsPerLevel-4 428.4m 413.3m ~ 0.200
_prepareTxsPerLevelOrdered-4 4.751m 4.370m ~ 0.400
_prepareTxsPerLevel_Comparison/Original-4 418.2m 431.4m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 4.297m 4.168m ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 336.8µ 334.3µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 338.6µ 340.4µ ~ 0.700
GetUtxoHashes-4 212.0n 213.1n ~ 1.000
GetUtxoHashes_ManyOutputs-4 37.55µ 35.90µ ~ 0.400
_NewMetaDataFromBytes-4 165.9n 165.6n ~ 0.600
_Bytes-4 305.7n 314.2n ~ 0.100
_MetaBytes-4 107.6n 108.7n ~ 0.400

Threshold: >10% with p < 0.05 | Generated: 2026-06-08 12:45 UTC

@oskarszoon oskarszoon merged commit 9ffcd9b into bsv-blockchain:main Jun 8, 2026
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants