Skip to content

ci: migrate native-Go + aerospike CI jobs to arm64; bump aerospike 8.0→8.1#1047

Merged
oskarszoon merged 6 commits into
bsv-blockchain:mainfrom
oskarszoon:ci/test-on-arm-runner
Jun 8, 2026
Merged

ci: migrate native-Go + aerospike CI jobs to arm64; bump aerospike 8.0→8.1#1047
oskarszoon merged 6 commits into
bsv-blockchain:mainfrom
oskarszoon:ci/test-on-arm-runner

Conversation

@oskarszoon

@oskarszoon oskarszoon commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrate the CI jobs that can safely run on arm64 to arm runners, and bump aerospike 8.0 → 8.1 (which has an arm64 image). This fixes the x86 make test OOM (arm runners have ample RAM) without -p caps or build-cache churn, and is cheaper.

Why

On x86 teranode-runner-16-core, make test (go test -race -coverpkg=./... over the whole repo) SIGTERM'd ~3min in during the compile — that cold race+coverage build peaks ~13GB and OOM'd the runner. Telemetry on a 32-core arm run showed ~70GB peak / 125GB, and 32-core gave no speedup (the suite is I/O/container-bound, not CPU-bound) — so 16-core arm is the cost/perf sweet spot. arm runners are cheaper/min and have the headroom the x86 runner lacked.

Runner matrix

Moved to arm64 (verified arm-safe — compiles incl. CGO/BDK, containers are arm64-available):

job runner
test, go-test (make test) 16-core-arm — postgres testcontainers (arm64), testtxmetacache tag (no aerospike)
golangci-lint 8-core-arm — pure Go analysis, no containers
sonar 4-core-arm — scanner CLI + artifacts, no build
prunertest (pr + main) 8-core-arm — aerospike 8.1 (arm64), no SV-node
chainintegrity (pr) 8-core-arm — teranode multi-arch + aerospike 8.1 + postgres/redpanda
chainintegrity, chainintegrity-3blasters 16-core-arm — same

Kept on x86smoketest, legacy-sync, sequential: they stand up a real bitcoinsv/bitcoin-sv SV-Node container, which is x86-only (no arm64 image). teranode being multi-arch doesn't unblock these — the SV-node image does.

aerospike 8.0 → 8.1

Bumped all official aerospike/aerospike-server:8.0 refs (test compose, chainintegrity + 3blasters compose, longtest aerospike8 test, and the gennodes templates used by make gen-multinode). Left the custom ghcr.io/bsv-blockchain/aerospike-server:8.0.0-3 and the already-8.1 deploy manifests as-is.

Test plan

  • test job green on arm64 (no compile-phase SIGTERM)
  • prunertest + chainintegrity green on arm64 — confirms aerospike 8.1 arm64 + the compose stack come up
  • follow-up: move smoketest/legacy-sync/sequential once an arm64 SV-node image exists

The x86 teranode-runner-16-core OOM-killed the make test compile
(go test -race -coverpkg=./... over the whole repo, ~13GB peak) during the
build phase. Move the two make-test jobs to teranode-runner-16-core-arm,
which has more headroom and runs natively on arm64 (verified locally:
full repo compiles incl. CGO/BDK, postgres testcontainers work). make test
uses the testtxmetacache tag (postgres only, no aerospike images), so no
arm64 image gaps. Other jobs stay on x86 (8-core-arm not yet available;
image-pulling jobs would need a multi-arch teranode image).
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

No issues found. The PR makes well-justified infrastructure changes:

Summary:

  • Migrates native-Go and Aerospike-only CI jobs to arm64 runners to resolve OOM issues
  • Updates Aerospike 8.0 → 8.1 consistently across test and compose files
  • Correctly keeps x86-only jobs (smoketest, legacy-sync, sequential) on x86 due to SV-node dependency
  • Adds clear inline comments explaining each runner choice

Changes verified:

  • All workflow runner migrations are consistent with stated rationale
  • Aerospike version bumps cover all test docker-compose files and templates
  • Deploy manifests correctly left as-is (custom ghcr.io image and k8s already at 8.1)
  • Runner selection rationale documented inline in each workflow

Bump the make-test jobs to teranode-runner-32-core-arm (more cores ->
faster -race+coverage build/test, more RAM headroom). Add a temporary
telemetry sampler to the PR test step: logs runner specs and peak
mem/disk during the run, surfaced as a job annotation, to size the
runner. Remove the telemetry before merge.
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1047 (29e9d5d)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 132
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.784µ 1.768µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.60n 61.80n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.59n 61.59n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.67n 61.51n ~ 0.300
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.74n 31.15n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 53.73n 52.95n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 125.0n 115.2n ~ 0.100
MiningCandidate_Stringify_Short-4 262.7n 259.7n ~ 0.100
MiningCandidate_Stringify_Long-4 1.930µ 1.912µ ~ 0.700
MiningSolution_Stringify-4 964.5n 998.4n ~ 0.100
BlockInfo_MarshalJSON-4 1.835µ 1.836µ ~ 1.000
NewFromBytes-4 146.3n 129.9n ~ 0.800
AddTxBatchColumnar_Validation-4 2.415µ 2.436µ ~ 1.000
OffsetValidationLoop-4 637.2n 638.9n ~ 0.400
Mine_EasyDifficulty-4 60.16µ 60.12µ ~ 0.400
Mine_WithAddress-4 6.692µ 6.731µ ~ 0.400
DirectSubtreeAdd/4_per_subtree-4 59.25n 59.03n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 30.09n 30.09n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 28.96n 28.98n ~ 0.700
DirectSubtreeAdd/1024_per_subtree-4 27.83n 27.88n ~ 0.200
DirectSubtreeAdd/2048_per_subtree-4 27.64n 27.60n ~ 1.000
SubtreeProcessorAdd/4_per_subtree-4 285.5n 286.8n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 288.4n 277.3n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 272.8n 270.6n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 273.7n 269.6n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 275.1n 271.1n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 273.9n 269.6n ~ 0.200
SubtreeProcessorRotate/64_per_subtree-4 324.3n 272.4n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 332.6n 271.0n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 348.0n 271.3n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.19n 53.85n ~ 0.400
SubtreeNodeAddOnly/64_per_subtree-4 34.39n 34.18n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.55n 33.25n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 32.76n 32.46n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 101.3n 120.1n ~ 0.200
SubtreeCreationOnly/64_per_subtree-4 472.2n 513.8n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.461µ 1.363µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.549µ 5.332µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.769µ 9.021µ ~ 0.400
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 454.6n 270.6n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 471.3n 268.2n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 11.42m 13.50m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 14.76m 15.85m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 18.14m 18.68m ~ 0.200
ParallelGetAndSetIfNotExists/100k_nodes-4 20.03m 18.80m ~ 1.000
SequentialGetAndSetIfNotExists/1k_nodes-4 10.98m 11.74m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 16.03m 13.47m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 27.87m 23.35m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 28.61m 31.45m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 12.27m 13.87m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 14.24m 14.57m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 19.21m 19.11m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 12.47m 15.79m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 15.91m 15.09m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 55.23m 61.73m ~ 1.000
DiskTxMap_SetIfNotExists-4 3.517µ 3.496µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.369µ 3.337µ ~ 1.000
DiskTxMap_ExistenceOnly-4 318.6n 318.3n ~ 0.700
Queue-4 190.9n 192.3n ~ 0.700
AtomicPointer-4 4.385n 4.433n ~ 1.000
TxMapSetIfNotExists-4 53.07n 52.52n ~ 0.200
TxMapSetIfNotExistsDuplicate-4 40.11n 39.86n ~ 0.700
ChannelSendReceive-4 579.7n 580.6n ~ 1.000
BlockAssembler_AddTx-4 0.02307n 0.02251n ~ 1.000
AddNode-4 9.639 10.037 ~ 0.200
AddNodeWithMap-4 10.17 10.09 ~ 0.700
CalcBlockWork-4 506.9n 553.2n ~ 0.700
CalculateWork-4 696.0n 691.8n ~ 0.100
CheckOldBlockIDs/on-chain-prefetch/1000-4 56.18µ 56.03µ ~ 0.700
CheckOldBlockIDs/off-chain-prefetch/1000-4 54.76µ 55.30µ ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/10000-4 421.7µ 419.9µ ~ 0.200
CheckOldBlockIDs/off-chain-prefetch/10000-4 339.5µ 336.0µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_10-4 1.365µ 1.351µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 13.07µ 12.98µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_1000-4 129.2µ 128.2µ ~ 0.100
CatchupWithHeaderCache-4 104.5m 104.5m ~ 1.000
_BufferPoolAllocation/16KB-4 4.060µ 3.981µ ~ 0.200
_BufferPoolAllocation/32KB-4 8.456µ 8.922µ ~ 1.000
_BufferPoolAllocation/64KB-4 16.42µ 20.64µ ~ 0.100
_BufferPoolAllocation/128KB-4 32.43µ 34.64µ ~ 0.100
_BufferPoolAllocation/512KB-4 115.7µ 130.4µ ~ 0.100
_BufferPoolConcurrent/32KB-4 19.52µ 19.65µ ~ 0.400
_BufferPoolConcurrent/64KB-4 31.39µ 29.91µ ~ 0.700
_BufferPoolConcurrent/512KB-4 146.7µ 147.9µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/16KB-4 665.8µ 684.6µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/32KB-4 686.5µ 678.3µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/64KB-4 712.1µ 679.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 716.5µ 682.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 649.0µ 656.5µ ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.47m 36.85m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.50m 37.19m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.41m 36.69m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.42m 36.75m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.93m 36.33m ~ 0.100
_PooledVsNonPooled/Pooled-4 736.4n 737.9n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.674µ 8.687µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.539µ 6.817µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.596µ 9.651µ ~ 1.000
_MemoryFootprint/Alternative_64KB_32concurrent-4 8.897µ 9.215µ ~ 0.400
_prepareTxsPerLevel-4 418.6m 415.1m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.729m 4.027m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 420.7m 416.7m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.557m 3.784m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.438m 1.435m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 335.7µ 334.8µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 79.47µ 79.48µ ~ 1.000
SubtreeSizes/10k_tx_256_per_subtree-4 19.76µ 19.72µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 9.788µ 9.796µ ~ 1.000
SubtreeSizes/10k_tx_1024_per_subtree-4 4.842µ 4.796µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 2.432µ 2.396µ ~ 0.200
BlockSizeScaling/10k_tx_64_per_subtree-4 76.77µ 77.13µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 19.45µ 19.42µ ~ 1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.842µ 4.770µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 403.1µ 396.0µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 96.26µ 96.57µ ~ 0.700
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.14µ 23.77µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 163.4µ 165.1µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 178.4µ 180.7µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 331.5µ 334.5µ ~ 0.200
SubtreeAllocations/medium_subtrees_exists_check-4 9.558µ 10.046µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 10.61µ 10.81µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.44µ 19.72µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.305µ 2.375µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.578µ 2.615µ ~ 0.700
SubtreeAllocations/large_subtrees_full_validation-4 4.880µ 4.920µ ~ 0.700
StoreBlock_Sequential/BelowCSVHeight-4 261.1µ 257.2µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 261.8µ 260.9µ ~ 0.700
GetUtxoHashes-4 265.4n 266.2n ~ 0.400
GetUtxoHashes_ManyOutputs-4 44.58µ 43.27µ ~ 0.100
_NewMetaDataFromBytes-4 225.6n 226.0n ~ 0.500
_Bytes-4 392.3n 394.8n ~ 0.700
_MetaBytes-4 134.5n 135.3n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-06-08 08:59 UTC

Switch the container-free / native-Go jobs to arm64 (cheaper, ample RAM —
fixes the x86 16-core make-test OOM without -p caps or build-cache bulk):
  - test (make test)        16-core -> 16-core-arm  (postgres testcontainers, arm64)
  - go-test (main, make test)16-core -> 16-core-arm
  - golangci-lint            8-core  -> 8-core-arm   (pure Go analysis, no containers)
  - sonar                    4-core  -> 4-core-arm   (scanner CLI + artifacts, no build)

Telemetry confirmed: 32-core gave no speedup (I/O/container-bound), so
16-core is the cost/perf sweet spot; reverted from the 32-core trial.

Kept on x86 (arm64 image blockers, NOT teranode which is multi-arch):
  - smoketest / sequential / legacy-sync: bitcoinsv/bitcoin-sv (x86-only) + aerospike
  - prunertest / chainintegrity: aerospike/aerospike-server (arm64 unverified)
These can move once SV-node has an arm64 image and aerospike arm64 is confirmed.
aerospike/aerospike-server:8.1 has an arm64 image, so:
- bump 8.0->8.1 in all official refs (test compose files, chainintegrity +
  3blasters compose, longtest aerospike8 test). Custom ghcr 8.0.0-3 and the
  already-8.1 deploy manifests left as-is.
- move the aerospike-only, SV-node-free jobs to arm64:
    prunertest (pr + main)         8-core  -> 8-core-arm
    chainintegrity (pr)            8-core  -> 8-core-arm
    chainintegrity / -3blasters    16-core -> 16-core-arm
  These use aerospike (now arm64), teranode:latest (multi-arch), postgres +
  redpanda (arm64) — no SV-node.

Still on x86 (bitcoinsv/bitcoin-sv is x86-only): smoketest, legacy-sync,
sequential — they stand up a real SV-Node container.
Address review: the gennodes docker-compose templates (used by
make gen-multinode) still pinned aerospike-server:8.0, leaving generated
compose files inconsistent with the rest of the 8.1 bump.
@oskarszoon oskarszoon changed the title ci: run the make-test job on arm64 runner ci: migrate native-Go + aerospike CI jobs to arm64; bump aerospike 8.0→8.1 Jun 7, 2026
@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

@oskarszoon oskarszoon requested review from icellan, ordishs and sugh01 June 8, 2026 09:00

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve — clean, well-documented CI-only change.

Verified against the tree:

  • aerospike bump is complete; only the custom ghcr.io/...:8.0.0-3 and already-8.1 deploy manifests remain (as the PR states).
  • x86 retention is correct: smoketest/legacy-sync/sequential are exactly the SV-node-dependent jobs.
  • Bumping shared compose files doesn't break the still-x86 consumers (nightly chainintegrity, longtest) since 8.1 is multi-arch.

Conditional on the test-plan checkboxes going green — particularly the prunertest/chainintegrity arm runs that prove the aerospike 8.1 arm64 image + compose stack come up.

Minor follow-ups (non-blocking): the separate sonarqube job in teranode_main_tests.yaml stays on x86 while sonar-pr-analyze moves to arm — same workload class, could move for consistency.

@oskarszoon oskarszoon merged commit 39a1580 into bsv-blockchain:main Jun 8, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants