Skip to content

ci: cache Go build cache via reusable go-cache action#1048

Merged
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/ci-caching
Jun 8, 2026
Merged

ci: cache Go build cache via reusable go-cache action#1048
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/ci-caching

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

What

CI never cached the Go build cache (~/.cache/go-build). setup-go's cache was disabled and we manually cached only the module cache (~/go/pkg/mod). For our workload — go test -race across the whole tree — the -race compile is the dominant cost, and it was rebuilt from scratch on every job.

This adds a reusable composite action .github/actions/go-cache and wires it into every Go-compiling workflow.

How

  • One authoritative saver, on main. GitHub scopes caches by branch — only the default branch's cache is readable by all PRs. So the main-branch unit-test job (teranode_main_testsgo-test) is the only writer (save gated to refs/heads/main); every other job is restore-only. staging/release/tag builds restore but don't write, leaving the 10 GB repo budget to main.
  • Daily-rotating build-cache key (gobuild-${os}-${hash(go.mod)}-${date}) with prefix restore-key fallback, so the build cache never pins stale for weeks. Module cache keeps the existing gomod-${os}-${hash(go.sum)} scheme.
  • Trim before save (find ~/.cache/go-build -mmin +90 -delete) keeps the uploaded tarball under the 10 GB cap.
  • -race vs not. Go keys the build cache by build flags, so a non-race job restoring the -race saver's multi-GB tarball gets ~zero hits — pure wasted bandwidth. The composite exposes build_cache (default true); non-race jobs set it false and restore the module cache only.
jobs Go work cache
pr/main tests, smoketests, long tests -race module + build
golangci, sonar, nightly, chainintegrity builds, benchmarks plain build/test module only

Also

  • benchmark-compare was on Go 1.25.2 while everything else is 1.26.0 — aligned it (added a workflow env: GO_VERSION so it can't drift again). Benches now compile/run under 1.26.0.

Notes

  • First PR after this merges is cold — the cache only warms once main's go-test runs post-merge. Merge first, then watch a follow-up PR for the build-cache hit.
  • Validated with actionlint on all touched workflows (clean; remaining notes are pre-existing shellcheck warnings in unrelated docker-log loops and the known custom-runner-label notes). Cache hit/timing can only be confirmed by a live run.

CI disabled setup-go's cache and manually cached only the module cache
(~/go/pkg/mod). The Go build cache (~/.cache/go-build) — the dominant cost
for our -race test/compile workload — was never cached, so every job
recompiled from scratch.

Add a reusable composite action (.github/actions/go-cache) that restores the
module + build cache, with the single main-branch unit-test job saving them.
GitHub scopes caches by branch, so only the default branch's cache is readable
by all PRs: one authoritative saver on main, everyone else restore-only. The
build-cache key rotates daily so it never goes stale, and a trim step keeps the
saved tarball under the 10 GB repo cache cap.

Wire it into every Go-compiling workflow. -race jobs (pr/main tests,
smoketests, long tests) get module + build cache; non-race jobs (golangci,
sonar, nightly, chainintegrity builds, benchmarks) get module cache only — the
build cache is flag-keyed by Go, so non-race jobs would only waste bandwidth on
a tarball they cannot hit.

Also align benchmark-compare to Go 1.26.0 (was 1.25.2, the only version drift).
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. The cache implementation is well-designed:

  • Cache architecture correctly designates the main-branch unit test job as the single authoritative saver, with all other jobs as restore-only consumers
  • Daily cache key rotation prevents stale builds while restore-keys provide fallback to recent caches
  • Selective build cache usage appropriately distinguishes between -race and non--race workloads to avoid bandwidth waste
  • Trim logic correctly executes before the post-job cache save hook
  • Go version alignment (1.26.0) eliminates the benchmark workflow drift mentioned in the PR description

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1048 (d46f0af)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 132
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.777µ 1.756µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.58n 61.82n ~ 0.800
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.98n 61.93n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.77n 61.56n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 30.84n 30.66n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 51.35n 50.46n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 105.5n 107.2n ~ 0.700
MiningCandidate_Stringify_Short-4 258.7n 260.9n ~ 0.100
MiningCandidate_Stringify_Long-4 1.876µ 1.896µ ~ 0.100
MiningSolution_Stringify-4 983.7n 986.6n ~ 1.000
BlockInfo_MarshalJSON-4 1.770µ 1.780µ ~ 0.100
NewFromBytes-4 125.2n 123.3n ~ 0.100
AddTxBatchColumnar_Validation-4 2.520µ 2.525µ ~ 1.000
OffsetValidationLoop-4 719.7n 720.0n ~ 0.700
Mine_EasyDifficulty-4 60.12µ 59.93µ ~ 0.700
Mine_WithAddress-4 6.791µ 6.760µ ~ 0.400
BlockAssembler_AddTx-4 0.02225n 0.02210n ~ 1.000
AddNode-4 9.557 9.977 ~ 0.700
AddNodeWithMap-4 10.251 9.980 ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 60.36n 58.04n ~ 0.200
DirectSubtreeAdd/64_per_subtree-4 29.99n 29.96n ~ 0.800
DirectSubtreeAdd/256_per_subtree-4 28.93n 29.01n ~ 0.700
DirectSubtreeAdd/1024_per_subtree-4 27.91n 27.86n ~ 0.500
DirectSubtreeAdd/2048_per_subtree-4 27.49n 27.46n ~ 0.400
SubtreeProcessorAdd/4_per_subtree-4 285.6n 286.9n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 269.9n 285.3n ~ 0.600
SubtreeProcessorAdd/256_per_subtree-4 271.6n 273.6n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 273.5n 268.4n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 278.0n 272.5n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 273.5n 281.4n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 273.3n 280.5n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 272.5n 273.8n ~ 0.400
SubtreeProcessorRotate/1024_per_subtree-4 273.5n 286.5n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 53.92n 54.69n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 34.19n 34.25n ~ 0.300
SubtreeNodeAddOnly/256_per_subtree-4 33.07n 33.38n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.45n 32.61n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 117.7n 120.1n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 424.8n 419.7n ~ 1.000
SubtreeCreationOnly/256_per_subtree-4 1.368µ 1.333µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.609µ 4.984µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.096µ 8.327µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 274.0n 278.7n ~ 0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 283.6n 279.9n ~ 0.700
ParallelGetAndSetIfNotExists/1k_nodes-4 10.69m 12.76m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 13.94m 15.40m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 16.73m 17.74m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 20.35m 22.35m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 10.62m 12.03m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 14.18m 16.09m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 22.40m 22.64m ~ 0.400
SequentialGetAndSetIfNotExists/100k_nodes-4 31.45m 28.85m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 11.43m 13.61m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 15.06m 16.17m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 19.06m 18.84m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 13.14m 11.60m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 15.11m 15.30m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 58.15m 53.56m ~ 0.100
DiskTxMap_SetIfNotExists-4 4.225µ 4.213µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.610µ 3.734µ ~ 0.100
DiskTxMap_ExistenceOnly-4 474.1n 463.1n ~ 1.000
Queue-4 192.6n 192.7n ~ 1.000
AtomicPointer-4 3.310n 3.256n ~ 0.100
TxMapSetIfNotExists-4 49.31n 49.75n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 41.44n 42.03n ~ 0.100
ChannelSendReceive-4 576.5n 563.2n ~ 0.200
CalcBlockWork-4 473.2n 481.9n ~ 0.700
CalculateWork-4 646.2n 647.9n ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/1000-4 59.58µ 58.85µ ~ 0.700
CheckOldBlockIDs/off-chain-prefetch/1000-4 53.23µ 50.30µ ~ 0.400
CheckOldBlockIDs/on-chain-prefetch/10000-4 425.8µ 431.8µ ~ 0.100
CheckOldBlockIDs/off-chain-prefetch/10000-4 355.6µ 351.2µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_10-4 1.378µ 1.396µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 13.23µ 13.23µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 128.4µ 130.2µ ~ 0.100
CatchupWithHeaderCache-4 105.0m 104.9m ~ 0.200
_BufferPoolAllocation/16KB-4 4.193µ 4.404µ ~ 1.000
_BufferPoolAllocation/32KB-4 9.336µ 9.111µ ~ 0.100
_BufferPoolAllocation/64KB-4 18.63µ 19.79µ ~ 0.100
_BufferPoolAllocation/128KB-4 38.62µ 36.96µ ~ 0.200
_BufferPoolAllocation/512KB-4 137.0µ 136.2µ ~ 1.000
_BufferPoolConcurrent/32KB-4 23.54µ 24.88µ ~ 0.100
_BufferPoolConcurrent/64KB-4 35.10µ 38.04µ ~ 0.400
_BufferPoolConcurrent/512KB-4 154.1µ 153.3µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 647.2µ 631.9µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/32KB-4 649.6µ 625.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 648.9µ 627.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 651.7µ 628.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 639.2µ 610.0µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.37m 37.22m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.12m 36.82m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.26m 36.69m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.28m 36.68m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.85m 36.59m ~ 0.400
_PooledVsNonPooled/Pooled-4 742.1n 741.3n ~ 0.700
_PooledVsNonPooled/NonPooled-4 8.824µ 9.218µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 6.774µ 6.832µ ~ 0.400
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.916µ 9.917µ ~ 1.000
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.296µ 9.390µ ~ 0.200
SubtreeSizes/10k_tx_4_per_subtree-4 1.268m 1.382m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 294.8µ 304.5µ ~ 0.100
SubtreeSizes/10k_tx_64_per_subtree-4 71.33µ 71.80µ ~ 0.400
SubtreeSizes/10k_tx_256_per_subtree-4 17.69µ 18.20µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 8.855µ 8.824µ ~ 1.000
SubtreeSizes/10k_tx_1024_per_subtree-4 4.331µ 4.343µ ~ 0.400
SubtreeSizes/10k_tx_2k_per_subtree-4 2.177µ 2.178µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 70.07µ 70.71µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 17.67µ 17.77µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.363µ 4.385µ ~ 0.700
BlockSizeScaling/50k_tx_64_per_subtree-4 366.2µ 375.3µ ~ 0.400
BlockSizeScaling/50k_tx_256_per_subtree-4 86.78µ 88.15µ ~ 0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.64µ 21.93µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 149.8µ 151.6µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 159.0µ 160.5µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 309.6µ 313.2µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 8.963µ 9.078µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 9.502µ 9.459µ ~ 1.000
SubtreeAllocations/medium_subtrees_full_validation-4 17.61µ 17.60µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.122µ 2.136µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.215µ 2.307µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.366µ 4.402µ ~ 0.700
_prepareTxsPerLevel-4 390.5m 382.9m ~ 0.400
_prepareTxsPerLevelOrdered-4 4.366m 5.097m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 389.1m 386.4m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 4.309m 4.318m ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 244.4µ 245.3µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 245.9µ 246.7µ ~ 0.700
GetUtxoHashes-4 268.9n 267.2n ~ 0.400
GetUtxoHashes_ManyOutputs-4 43.92µ 44.10µ ~ 1.000
_NewMetaDataFromBytes-4 227.5n 229.4n ~ 0.100
_Bytes-4 399.2n 406.9n ~ 0.700
_MetaBytes-4 138.4n 139.4n ~ 0.400

Threshold: >10% with p < 0.05 | Generated: 2026-06-07 08:29 UTC

@sonarqubecloud

sonarqubecloud Bot commented Jun 7, 2026

Copy link
Copy Markdown

@oskarszoon oskarszoon enabled auto-merge (squash) June 8, 2026 07:00

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. CI-only change with a sound caching model — single authoritative saver on main (correct given GitHub's branch-scoped caches) and the -race vs non-race build_cache split is a sharp call.

Verified: checkout precedes the local composite action in every touched job; the boolean save input flows correctly through the string guards; restore-vs-save variants are wired right (PR jobs never write); SHA-pinned actions/cache.

Minor non-blocking notes for follow-up:

  • Trim Go build cache lacks a branch guard — runs on non-main go-test invocations where nothing is ever saved (harmless, just wasted work).
  • -mmin +90 deletes by mtime (≈ creation time for write-once build objects), so on make test runs exceeding 90 min it may drop still-useful objects before the post-job save.
  • Worth watching the repo cache usage page for a few days — daily-rotating build-cache tarballs accumulating against the 10 GB budget could evict the module cache.

@oskarszoon oskarszoon merged commit 66ccc37 into bsv-blockchain:main Jun 8, 2026
42 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants