Skip to content

fix(ci): retry transient registry fetches in primer generator#462

Merged
jdx merged 1 commit intomainfrom
fix-primer-retry
May 1, 2026
Merged

fix(ci): retry transient registry fetches in primer generator#462
jdx merged 1 commit intomainfrom
fix-primer-retry

Conversation

@jdx
Copy link
Copy Markdown
Contributor

@jdx jdx commented May 1, 2026

Summary

The v1.6.1 release-plz macOS upload-assets job (https://github.com/endevco/aube/actions/runs/25232551216/job/73991667575) failed mid-primer-generation when a single fetch(registry.npmjs.org/<pkg>) hit a TLS socket close at package 786/2000:

[TypeError: fetch failed] {
  [cause]: SocketError: other side closed
  ...
  code: 'UND_ERR_SOCKET',
}

The script had no retry, so a transient blip during a 2000-package run crashed the whole release. Wrap fetch with up-to-5-attempt exponential backoff (1s/2s/4s/8s) that retries network errors, 5xx, and 429, and propagates other 4xx as terminal.

Linux upload-assets jobs in the same run already pass via the empty-primer fallback from #460 — only macOS (which has node and runs the script for real) was hitting the transient blip. Windows builds will benefit from the same retry.

After merge, re-run the failed Upload assets / upload-assets (aarch64-apple-darwin, ...) job for v1.6.1 to backfill the macOS tarball.

Test plan

  • node --check scripts/generate-primer.mjs clean
  • Smoke-test the retry helper with stubbed fetch that throws UND_ERR_SOCKET twice — third attempt returns 200, retries logged
  • Re-run the v1.6.1 macOS upload-assets job after merge → confirm the primer generates and the tarball lands on the GH release

🤖 Generated with Claude Code


Note

Low Risk
Low risk: only changes the scripts/generate-primer.mjs fetch behavior by adding retries/backoff for transient network/HTTP failures, which may slightly increase run time but should reduce flaky CI failures.

Overview
Improves primer generation robustness by wrapping registry/name-list fetch calls in a new fetchWithRetry helper with exponential backoff.

The script now retries transient network errors plus HTTP 5xx and 429, while treating other 4xx responses as terminal and preserving existing failure/skip behavior when the final attempt still fails.

Reviewed by Cursor Bugbot for commit 637800f. Bugbot is set up for automated code reviews on this repo. Configure here.

Single uncached `fetch(registry.npmjs.org/<pkg>)` calls during the
2000-package primer run hit a TLS socket close at package 786/2000
mid-flight, crashing the whole script and failing the v1.6.1 macOS
release upload. Wrap fetch with up-to-5-attempt exponential backoff,
retry on network errors / 5xx / 429, propagate other 4xx as terminal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 1, 2026

Greptile Summary

Wraps all fetch calls in generate-primer.mjs with a fetchWithRetry helper that performs up-to-5 attempts with 1s/2s/4s/8s exponential backoff, retrying network errors, 5xx, and 429 while short-circuiting on other 4xx. The retry logic and terminal-failure semantics are correct; only two minor polish issues were found (response body leak on retry, log counter semantics).

Confidence Score: 4/5

Safe to merge; the retry logic is correct and the two P2 findings are minor quality concerns that do not affect correctness.

No P0 or P1 issues found. Two P2 findings (unconsumed response body on retry path, slightly off log counter) cap the score at 4/5.

scripts/generate-primer.mjs — response body cancellation on retry path

Important Files Changed

Filename Overview
scripts/generate-primer.mjs Adds fetchWithRetry with exponential backoff (1s–8s, 5 attempts) wrapping both packumentSeed and fetchPopularNames; logic is correct but retried responses are not body-cancelled before sleeping, and the retry log counter is slightly misleading.

Fix All in Claude Code

Reviews (1): Last reviewed commit: "fix(ci): retry transient registry fetche..." | Re-trigger Greptile

Comment on lines +200 to +206
if (res.ok || (res.status >= 400 && res.status < 500 && res.status !== 429)) return res
if (i === attempts) return res
console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`)
} catch (err) {
if (i === attempts) throw err
console.error(` retry ${i}/${attempts - 1}: ${err.cause?.code ?? err.code ?? err.message}`)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unconsumed response body leaks connection on retry

When a 5xx or 429 response triggers a retry, res is discarded without consuming or cancelling its body. In Node.js's undici-based fetch, the underlying socket won't be returned to the connection pool until the body is drained or garbage collected. For a 2000-package sequential run with multiple retries this is unlikely to exhaust the pool, but it's a resource leak that can cause delayed socket reuse.

Add await res.body?.cancel() before the sleep to release the connection immediately.

Suggested change
if (res.ok || (res.status >= 400 && res.status < 500 && res.status !== 429)) return res
if (i === attempts) return res
console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`)
} catch (err) {
if (i === attempts) throw err
console.error(` retry ${i}/${attempts - 1}: ${err.cause?.code ?? err.code ?? err.message}`)
}
const res = await fetch(url, init)
if (res.ok || (res.status >= 400 && res.status < 500 && res.status !== 429)) return res
if (i === attempts) return res
console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`)
await res.body?.cancel()

Fix in Claude Code

Comment on lines +202 to +205
console.error(` retry ${i}/${attempts - 1}: HTTP ${res.status}`)
} catch (err) {
if (i === attempts) throw err
console.error(` retry ${i}/${attempts - 1}: ${err.cause?.code ?? err.code ?? err.message}`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Retry log counter is off by one

The message retry ${i}/${attempts - 1} is printed on the first failed attempt (i=1) before any retry has occurred, so the first log line reads retry 1/4 even though zero retries have happened yet. Readers scanning CI logs typically interpret retry X/Y as "this is retry #X"; the actual first retry fires on the next loop iteration (i=2). Consider rephrasing to attempt ${i} failed, will retry (${i}/${attempts - 1}) to make the semantics explicit.

Fix in Claude Code

@jdx jdx merged commit ba7f671 into main May 1, 2026
20 checks passed
@jdx jdx deleted the fix-primer-retry branch May 1, 2026 21:06
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Benchmark changes

Versions:

  • aube: 1.5.2 -> 1.6.1

Public ratios: warm installs vs Bun 6x -> 5x; warm installs vs pnpm 10x -> 11x.

Benchmark aube bun pnpm
Fresh install (warm cache) 230ms -> 200ms (-13%) 1488ms -> 981ms (-34%) 2367ms -> 2226ms (-6%)
CI install (warm cache, GVS disabled) 564ms -> 1052ms (+87%) 1295ms -> 924ms (-29%) 2361ms -> 1967ms (-17%)
CI install (cold cache, GVS disabled) 5800ms -> 4290ms (-26%) 4278ms -> 4362ms (+2%) 4823ms -> 4777ms (-1%)

637800f vs 98cec0f | aube/bun/pnpm | 3 scenarios | 3 runs | 500mbit/50ms | generated by Codex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant