ci: bump test job to 32-core/128GB runner to stop intermittent OOM (#1051)#1052
Conversation
…sv-blockchain#1051) `make test` (go test -race -coverpkg=./... over the whole repo) hits exit 143 intermittently: a few test binaries peak very high (sql ~34G, blockassembly ~31G, subtreeprocessor ~20G) and at -p=16 enough overlap to exhaust the 16-core/64GB runner (job peak ~110GB+). Temporary bump to 32-core/128GB to stop the bleeding while per-package memory is reduced. Also adds an end-of-run top-10 peak-memory dump (per-binary VmHWM + cgroup job peak) to monitor improvements. Revert the runner once bsv-blockchain#1051 lands.
|
🤖 Claude Code Review Status: Complete Current Review: Found 1 minor issue — see inline comment:
Overall Assessment: This is a reasonable temporary mitigation for the OOM issue described in #1051. The approach is sound:
Shell Script Review: The memory tracking implementation is generally solid:
Minor note: The background |
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-06-08 12:45 UTC |



Temporary mitigation for the intermittent
Error: Process completed with exit code 143in thetestcheck. Full root-cause analysis and the memory-reduction plan: #1051.What
testjob runner:teranode-runner-16-core-arm(64GB) →teranode-runner-32-core-arm(128GB).Why
make testrunsgo test -race -coverpkg=./...across the whole repo. A few test binaries peak very high (measured viaVmHWM):At
-p=16enough of these overlap to push the job-wide peak to ~110GB+, intermittently OOMing the 64GB runner → SIGTERM →exit code 143. 128GB gives headroom.This is a stopgap, not a fix. The real work — reducing per-package memory — is tracked in #1051. Revert the runner to 16-core once that lands.
Monitoring
The test step now prints, at the end of every run (collapsed
Peak memorylog group):per-binary peak via
/proc/<pid>/statusVmHWM(kernel monotonic high-water — no sampling gaps), plus the cgroup job-wide peak. End-only, top 10, no per-interval spam — so we can watch the numbers drop as #1051 progresses.Scope
testjob only.teranode_main_tests.yamlruns the samemake teston 16-core and has identical exposure, but is left as-is to avoid recurring 32-core cost on every merge tomain(noted in Unit test suite (make test) peaks ~110GB+ — reduce per-package memory to fit standard runners #1051).