test(dev): debug instrumentation for node-dev-server-test CI flake by h-a-n-a · Pull Request #9735 · rolldown/rolldown

h-a-n-a · 2026-06-13T08:20:44Z

What

Temporary, greppable instrumentation to diagnose the intermittent node-dev-server-test timeout (the job exceeds the 15-min ceiling and is canceled). Seen on node 20 while node 22/24 pass the same commit in <2 min, so it's a nondeterministic hang. Can't reproduce locally, hence pushing instrumentation to CI.

Working theory

The browser playground specs mutate a shared on-disk fixture (playground-temp/<name>/) via anchored string replacement, with no per-test/per-retry reset (the playground is copied once in global setup). A flaky first-attempt failure leaves the fixture off-baseline; after that, retries and sibling specs can never recover because editFile silently no-ops when its anchor string is gone — so the expected rebuild/HMR never fires and the test burns its full testTimeout × 3 retries. With specs running sequentially and sharing the dir, one flaky failure snowballs into the 15-min cancellation.

Evidence from the failing run: rebuild-error.spec.ts is the first to fail, then hmr-full-bundle-mode.spec.ts fails all 5 tests (~11 min), with repeated [editFile] No changes detected … warnings throughout.

Instrumentation (grep prefixes)

[envdbg] — node version, cpus/mem/loadavg, resolved vitest config, spec files, copied playgrounds (compare node 20 vs 22/24).
[fixtdbg] ── BEFORE — full fixture-tree snapshot at the start of every attempt, with attempt#/retryCount. Confirms whether retries/sibling specs inherit a corrupted fixture.
[fixtdbg] ── FAILED (onTestFailed) — error + fixture snapshot + server/browser log tails, printed inline so it survives the 15-min cancellation (Vitest's batched summary doesn't).
[editFile] — logs APPLIED before -> after; the NOOP case now prints the current file content.
hanging-process reporter on both suites — surfaces the close timed out after 10000ms / orphaned-process symptom.
RD_TEST_RETRY / RD_TEST_TIMEOUT env overrides — set RD_TEST_RETRY=0 to fail fast and let the whole suite finish inside the job budget on a repro run.

Note

This is diagnostic-only and meant to be reverted (or trimmed to whatever turns out to be load-bearing) once the root cause is confirmed. No production code touched.

🤖 Generated with Claude Code

The node-dev-server-test job intermittently times out at the 15-min ceiling (seen on node 20 while node 22/24 pass the same commit in <2min). The playground specs mutate a shared on-disk fixture (playground-temp/<name>/) via anchored string replacement with no per-test/per-retry reset; the working theory is that a flaky first-attempt failure leaves the fixture off-baseline, after which retries and sibling specs can never recover because editFile silently no-ops on a missing anchor. Can't reproduce locally, so add temporary, greppable instrumentation to confirm it on CI: - [envdbg]: node version, cpus/mem/loadavg, resolved vitest config, spec files - [fixtdbg] BEFORE: full fixture-tree snapshot at the start of every attempt, with attempt#/retryCount - [fixtdbg] FAILED (onTestFailed): error + fixture snapshot + server/browser log tails, printed inline so it survives the 15-min cancellation - [editFile]: log APPLIED before->after, and NOOP now prints current content - hanging-process reporter on both suites (the "close timed out" symptom) - RD_TEST_RETRY / RD_TEST_TIMEOUT env overrides for faster repro runs Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

netlify · 2026-06-13T08:21:00Z

✅ Deploy Preview for rolldown-rs canceled.

Name	Link
🔨 Latest commit	`f1e583b`
🔍 Latest deploy log	https://app.netlify.com/projects/rolldown-rs/deploys/6a2d28ce309e0500080b7d4b

…poll trace The broad [envdbg]/[fixtdbg]/[editFile] snapshot logging confirmed the cascade (rebuild-error.spec leaks flag.txt=broken-1 into the shared playground-temp, poisoning every sibling build). What's still unexplained is the *trigger*: why attempt#1's recovery poll never sees 'rebuild-error: recovered' (rebuild-error.spec.ts:77). Strip the noisy instrumentation and replace it with a focused probe on that one poll: - bump the poll 15s -> 30s to learn whether it recovers late or is truly stuck - each tick print `.rebuild-error` text + overlay count + any new server logs, so server-side rebuild activity lines up against what the browser renders This disambiguates: page never reloaded (overlay clears, text stuck at 'ok') vs recovery build never succeeded (overlay stays, errors keep logging) vs simply too slow (flips to 'recovered' past 15s). Reverts the [envdbg] env dump, [fixtdbg] fixture snapshots, [editFile] APPLIED/NOOP verbosity, and the hanging-process reporters. Keeps the RD_TEST_RETRY / RD_TEST_TIMEOUT knobs (run with RD_TEST_RETRY=0 to keep attempt#1's trace out from under retries + the cascade). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The [recover] probe showed the failing case has zero server activity for 30s (overlay stuck at 1, value stuck at 'ok', no server logs) while retries prove the recovery write reached disk — i.e. the watched module.js change never triggered a rebuild. Add two more per-tick signals to pin the gap between disk and server: - read rebuild-error/module.js (its value field) and flag.txt from disk each tick, so the trace shows the recovery write landed and stays even while nothing rebuilds - read buildSeq from /_dev/status each tick; a flat buildSeq across the whole poll confirms no build was ever triggered (a dropped file-watch event), vs the working case where it increments when the rebuild lands Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

h-a-n-a · 2026-06-13T10:57:33Z

Failed: https://github.com/rolldown/rolldown/actions/runs/27463457201/job/81182612946#step:9:604

Successful: https://github.com/rolldown/rolldown/actions/runs/27463457201/job/81182294575#step:9:607

Something in the watcher seems not to be working. File edits sometimes do not reflect in watcher updates even in poll mode. Will investigate. cc @sapphi-red @hyf0

github-actions Bot assigned h-a-n-a Jun 13, 2026

h-a-n-a and others added 2 commits June 13, 2026 17:26

This was referenced Jun 13, 2026

chore(generator): fix vp fmt on Windows #9727

Merged

fix(dev): detect same-second rewrites in CI poll watcher #9736

Merged

h-a-n-a closed this Jun 13, 2026

h-a-n-a deleted the test/dev-server-flake-debug-logs branch June 13, 2026 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(dev): debug instrumentation for node-dev-server-test CI flake#9735

test(dev): debug instrumentation for node-dev-server-test CI flake#9735
h-a-n-a wants to merge 3 commits into
mainfrom
test/dev-server-flake-debug-logs

h-a-n-a commented Jun 13, 2026

Uh oh!

netlify Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

h-a-n-a commented Jun 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

h-a-n-a commented Jun 13, 2026

What

Working theory

Instrumentation (grep prefixes)

Note

Uh oh!

netlify Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for rolldown-rs canceled.

Uh oh!

h-a-n-a commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netlify Bot commented Jun 13, 2026 •

edited

Loading

h-a-n-a commented Jun 13, 2026 •

edited

Loading