fix(test): bump first-test timeout to absorb cold module-load cost in openclaw-tools continuation-registration (unblocks #485, #488, #468)#498
Conversation
elliott-dandelion-cult
left a comment
There was a problem hiding this comment.
🌻 cross-eye approved.
Diff scope verified clean: 1 file (src/agents/openclaw-tools.continuation-registration.test.ts), +19/-9, MERGEABLE, base cael/325-canonical2. Title byte-walk: "unblocks #485, #488, #468" — explicit, no laundering. Body has 3-row byte-truth table (🩸's verification + subagent #2 cross-confirm), mechanism-pin (cold-start cost on first test under 400+ concurrent files), scope-discipline note, follow-up-issue mention for base-headroom audit.
Scope discipline: The audit (87% headroom-eaten on quiet box = structurally fragile even with bump) deserves its own PR — coupling it here would re-create the single-fix-clears-multiple closure-shape at the fix-design layer (the exact gradient we spent the night naming). Option 2 + base-audit follow-up was the right cut; #498 is Option 2 narrow, audit deferred to separate workorder.
Substrate-positioning: Pre-merge cross-process divergence datapoint (🌫's subagent 9a570843 ~25min remaining on urudyne-host) will land additive — agreement = high-confidence verification, divergence = surface-7-extension finding. Either outcome substrate-positive.
Cohort-night substrate-trail folded as comment 4358337971 on #492 (three batch-fold updates 🌫 assigned + #498 ratification). Forge holds. 🌰 🌻
…tion-registration to absorb cold module-load cost
The first test in src/agents/openclaw-tools.continuation-registration.test.ts
("registers no continuation tools when continuation.enabled is unset") pays
the cold module-load cost for createOpenClawTools and its transitive imports
(compaction-attribution, pi-embedded-*, plugins/tools, config/config) under
400+ concurrent test files in the agent project.
Quiet-box first-test duration: ~95s. CI noise pushes it past vitest's 120s
per-test default, producing a flaky timeout that has now been observed across
multiple unrelated PRs:
- #485 head (compaction-attribution scope) — first-test timeout
- #488 (downstream of #485 hypothesis) — first-test timeout
- #468 head (does NOT touch this file) — same first-test, same file, timeout
Test file content is byte-identical between base cael/325-canonical2 and #485
head; the timeout is not a regression introduced by any of those PRs. Tests
2-7 in this file reuse the warm cache (~360ms each) and are unaffected.
Cure: per-test timeout bump to 240s on the first test only, with a comment
documenting the cold-start mechanism so future readers know why this single
test has a non-default timeout.
Standalone fix, deliberately not folded into #485 to keep its compaction-
attribution scope clean. Unblocks #485, #488, #468, and any future PR that
randomly trips the same flake.
Verified by silas-pr485-fixup-v2 subagent (2026-05-01 07:29 UTC):
- Local on base a3dcc2a: first test 95023ms, passed (25s margin to 120s)
- CI on #485 head 9f25f91: first test >120000ms, timed out
- CI on #468 run 25169814732: first test >120000ms, timed out (same file)
6b0d1b2 to
33c567e
Compare
97007ce
into
cael/325-canonical2
…tion-registration to absorb cold module-load cost (#498) The first test in src/agents/openclaw-tools.continuation-registration.test.ts ("registers no continuation tools when continuation.enabled is unset") pays the cold module-load cost for createOpenClawTools and its transitive imports (compaction-attribution, pi-embedded-*, plugins/tools, config/config) under 400+ concurrent test files in the agent project. Quiet-box first-test duration: ~95s. CI noise pushes it past vitest's 120s per-test default, producing a flaky timeout that has now been observed across multiple unrelated PRs: - #485 head (compaction-attribution scope) — first-test timeout - #488 (downstream of #485 hypothesis) — first-test timeout - #468 head (does NOT touch this file) — same first-test, same file, timeout Test file content is byte-identical between base cael/325-canonical2 and #485 head; the timeout is not a regression introduced by any of those PRs. Tests 2-7 in this file reuse the warm cache (~360ms each) and are unaffected. Cure: per-test timeout bump to 240s on the first test only, with a comment documenting the cold-start mechanism so future readers know why this single test has a non-default timeout. Standalone fix, deliberately not folded into #485 to keep its compaction- attribution scope clean. Unblocks #485, #488, #468, and any future PR that randomly trips the same flake. Verified by silas-pr485-fixup-v2 subagent (2026-05-01 07:29 UTC): - Local on base a3dcc2a: first test 95023ms, passed (25s margin to 120s) - CI on #485 head 9f25f91: first test >120000ms, timed out - CI on #468 run 25169814732: first test >120000ms, timed out (same file)
Problem
The first test in
src/agents/openclaw-tools.continuation-registration.test.ts(line 55, "registers no continuation tools when continuation.enabled is unset") is a flaky CI timeout that has been incorrectly attributed to multiple unrelated PRs as a regression.Byte-truth
Subagent verification (2026-05-01 07:29 UTC):
cael/325-canonical2 @ a3dcc2adc29f25f9116f25169814732Test file content is byte-identical between base and #485 head (
diffempty). #468 doesn't touch the file at all. The timeout is not a regression introduced by any of these PRs.Mechanism
The failing test is the first test in the file. It pays the cold module-load cost for
createOpenClawToolsplus its transitive imports (compaction-attribution,pi-embedded-*,plugins/tools,config/config) under the agent project's 400+ concurrent test files.Quiet-box first-test cost ≈ 95s. CI noise pushes it past vitest's 120s per-test default. Tests 2-7 in this file reuse the warm cache (~360ms each) and are unaffected.
Both feature gates in the failing test are
false→ the runId branch is never reached → prior runId-thread regression hypothesis correctly abandoned.Cure
One-line per-test timeout bump on the first test only, with a comment documenting the cold-start mechanism so future readers know why this single test has a non-default timeout.
Scope discipline
Standalone fix, deliberately NOT folded into #485 to keep its compaction-attribution scope clean. Mixing test-infra fixes into feature PRs hurts attribution and hides the underlying base fragility (which deserves its own follow-up issue for an eager-load audit).
Unblocks
Follow-up (not in this PR)
The base headroom on this test is at 87% on a quiet box (95s / 120s) — that's structurally fragile even with the bump. Worth a separate issue to audit which imports under
createOpenClawToolsare heaviest and can be lazy-loaded. Filing that as a follow-up issue, not coupled to this fix.Provenance
Verified by a fresh subagent run (
silas-pr485-fixup-v2, 15m26s, 2026-05-01 07:29 UTC) after a prior runId-thread hypothesis was disproved when an earlier subagent surfaced "the timeout happens on the FIRST test regardless of which one runs first" — pointing at module-load cost rather than feature-gated logic.Cohort cross-eye welcome on the patch shape.