fix(ci): retry transient registry fetches in primer generator#462
Conversation
Single uncached `fetch(registry.npmjs.org/<pkg>)` calls during the 2000-package primer run hit a TLS socket close at package 786/2000 mid-flight, crashing the whole script and failing the v1.6.1 macOS release upload. Wrap fetch with up-to-5-attempt exponential backoff, retry on network errors / 5xx / 429, propagate other 4xx as terminal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryWraps all Confidence Score: 4/5Safe to merge; the retry logic is correct and the two P2 findings are minor quality concerns that do not affect correctness. No P0 or P1 issues found. Two P2 findings (unconsumed response body on retry path, slightly off log counter) cap the score at 4/5. scripts/generate-primer.mjs — response body cancellation on retry path Important Files Changed
Reviews (1): Last reviewed commit: "fix(ci): retry transient registry fetche..." | Re-trigger Greptile |
| if (res.ok || (res.status >= 400 && res.status < 500 && res.status !== 429)) return res | ||
| if (i === attempts) return res | ||
| console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`) | ||
| } catch (err) { | ||
| if (i === attempts) throw err | ||
| console.error(` retry ${i}/${attempts - 1}: ${err.cause?.code ?? err.code ?? err.message}`) | ||
| } |
There was a problem hiding this comment.
Unconsumed response body leaks connection on retry
When a 5xx or 429 response triggers a retry, res is discarded without consuming or cancelling its body. In Node.js's undici-based fetch, the underlying socket won't be returned to the connection pool until the body is drained or garbage collected. For a 2000-package sequential run with multiple retries this is unlikely to exhaust the pool, but it's a resource leak that can cause delayed socket reuse.
Add await res.body?.cancel() before the sleep to release the connection immediately.
| if (res.ok || (res.status >= 400 && res.status < 500 && res.status !== 429)) return res | |
| if (i === attempts) return res | |
| console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`) | |
| } catch (err) { | |
| if (i === attempts) throw err | |
| console.error(` retry ${i}/${attempts - 1}: ${err.cause?.code ?? err.code ?? err.message}`) | |
| } | |
| const res = await fetch(url, init) | |
| if (res.ok || (res.status >= 400 && res.status < 500 && res.status !== 429)) return res | |
| if (i === attempts) return res | |
| console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`) | |
| await res.body?.cancel() |
| console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`) | ||
| } catch (err) { | ||
| if (i === attempts) throw err | ||
| console.error(` retry ${i}/${attempts - 1}: ${err.cause?.code ?? err.code ?? err.message}`) |
There was a problem hiding this comment.
Retry log counter is off by one
The message retry ${i}/${attempts - 1} is printed on the first failed attempt (i=1) before any retry has occurred, so the first log line reads retry 1/4 even though zero retries have happened yet. Readers scanning CI logs typically interpret retry X/Y as "this is retry #X"; the actual first retry fires on the next loop iteration (i=2). Consider rephrasing to attempt ${i} failed, will retry (${i}/${attempts - 1}) to make the semantics explicit.
Benchmark changesVersions:
Public ratios: warm installs vs Bun 6x -> 5x; warm installs vs pnpm 10x -> 11x.
637800f vs 98cec0f | aube/bun/pnpm | 3 scenarios | 3 runs | 500mbit/50ms | generated by Codex. |
Summary
The v1.6.1 release-plz macOS upload-assets job (https://github.com/endevco/aube/actions/runs/25232551216/job/73991667575) failed mid-primer-generation when a single
fetch(registry.npmjs.org/<pkg>)hit a TLS socket close at package 786/2000:The script had no retry, so a transient blip during a 2000-package run crashed the whole release. Wrap fetch with up-to-5-attempt exponential backoff (1s/2s/4s/8s) that retries network errors, 5xx, and 429, and propagates other 4xx as terminal.
Linux upload-assets jobs in the same run already pass via the empty-primer fallback from #460 — only macOS (which has node and runs the script for real) was hitting the transient blip. Windows builds will benefit from the same retry.
After merge, re-run the failed
Upload assets / upload-assets (aarch64-apple-darwin, ...)job for v1.6.1 to backfill the macOS tarball.Test plan
node --check scripts/generate-primer.mjscleanfetchthat throwsUND_ERR_SOCKETtwice — third attempt returns 200, retries logged🤖 Generated with Claude Code
Note
Low Risk
Low risk: only changes the
scripts/generate-primer.mjsfetch behavior by adding retries/backoff for transient network/HTTP failures, which may slightly increase run time but should reduce flaky CI failures.Overview
Improves primer generation robustness by wrapping registry/name-list
fetchcalls in a newfetchWithRetryhelper with exponential backoff.The script now retries transient network errors plus HTTP
5xxand429, while treating other4xxresponses as terminal and preserving existing failure/skip behavior when the final attempt still fails.Reviewed by Cursor Bugbot for commit 637800f. Bugbot is set up for automated code reviews on this repo. Configure here.