perf(tests): refactor embedded attempt runner helpers#87410
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 11:03 PM ET / 03:03 UTC. Summary PR surface: Source +66, Tests +95. Total +161 across 12 files. Reproducibility: not applicable. this is a performance/refactor PR rather than a bug report. The PR body provides before/after local wrapper measurements and focused test results for the changed helper surfaces. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the extraction if maintainers accept the scoped proof, keeping the production runner wired to the extracted helpers and preserving runtime behavior. Do we have a high-confidence way to reproduce the issue? Not applicable; this is a performance/refactor PR rather than a bug report. The PR body provides before/after local wrapper measurements and focused test results for the changed helper surfaces. Is this the best way to solve the issue? Yes; the scoped runner policy favors moving helper behavior into production helpers with focused tests, and the diff avoids a test-only copy while keeping the runner call sites on the same decisions. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against bf22893cb6e9. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +66, Tests +95. Total +161 across 12 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🌱 uncommon Mossy Lint Imp Hatch commandComment Hatchability rules:
Rarity: 🌱 uncommon. What is this egg doing here?
|
5f85619 to
ef00cf4
Compare
9702dc0 to
e296862
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
* refactor: extract embedded attempt runner helpers * fix: remove unused attempt queue type import * fix: restore attempt helper coverage * fix: clear attempt cleanup ci * fix: restore model prompt transform extraction
* refactor: extract embedded attempt runner helpers * fix: remove unused attempt queue type import * fix: restore attempt helper coverage * fix: clear attempt cleanup ci * fix: restore model prompt transform extraction
* refactor: extract embedded attempt runner helpers * fix: remove unused attempt queue type import * fix: restore attempt helper coverage * fix: clear attempt cleanup ci * fix: restore model prompt transform extraction
Motivation for this is test performance:
Summary
This PR is mainly a structural cleanup for the embedded attempt runner: it extracts pure helper decisions out of the 5k-line
run/attempt.tsorchestration hotspot so those decisions can be tested without importing the full runner. It is not intended to delete behavior or change user-visible runtime behavior.run/attempt.tsto consume those helpers directly, without compatibility shims.run/attempt.test.tssurface into focused helper tests.Test Cost / Size Impact
run/attempt.ts: 5339 -> 4755 LOC.run/attempt.test.ts: 4095 -> 3480 LOC.node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.tsreported Vitest duration 12.72s, max RSS about 1.50 GB.checks-node-agentic-agentsviatest/vitest/vitest.agents-embedded-agent.config.ts. This is not a measured 8-10s reduction across all PR Actions; total PR wall-clock impact depends on whether that shard is on the critical path and will likely be small/noisy.The net diff is modest because the helper logic still exists; the win is that runner orchestration and helper tests are no longer coupled through the full
attempt.tsmodule.Verification
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.tsnode scripts/run-vitest.mjs src/agents/embedded-agent-runner/run.incomplete-turn.test.tsnode scripts/run-vitest.mjs src/agents/embedded-agent-runner-extraparams.test.tsnode scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/model.test.tsnode scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts src/agents/embedded-agent-runner/run.incomplete-turn.test.ts src/agents/embedded-agent-runner-extraparams.test.ts src/agents/embedded-agent-runner/run/attempt.run-decisions.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/run/attempt.bootstrap-context.test.ts src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.queue-message.test.ts src/agents/embedded-agent-runner/model.test.tsnode scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.test.src.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/test-src.tsbuildinfonode scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfonode scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.json src/agents/embedded-agent-runner/run/attempt.llm-boundary.ts src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/model.provider-runtime.test-support.tspnpm protocol:gen && pnpm protocol:gen:swiftgit diff --check.agents/skills/autoreview/scripts/autoreview --mode localReal behavior proof
Behavior addressed: Embedded attempt runner helper extraction and test slimming; no user-visible runtime behavior change intended.
Real environment tested: Local Codex worktree on macOS with repo Node/Vitest wrappers.
Exact steps or command run after this patch: The Verification commands listed above.
Evidence after fix: Grouped Vitest validation passed 14 files and 949 tests; focused ClawSweeper/CI fix validation passed 4 files and 211 tests; tsgo prod/test-source passes exited clean; focused oxlint passed; protocol generation produced the checked-in Swift protocol output; autoreview reported no accepted/actionable findings.
Observed result after fix: Runner helper decisions are covered by focused tests, LLM-boundary metadata/runtime-context sanitization coverage is restored, Tool Search bad-allowlist/client-tool coverage is restored, and
run/attempt.test.tsno longer importsrun/attempt.tsas a broad helper barrel.What was not tested: Full
pnpm check/ full suite and live provider behavior; this is a scoped refactor of existing helper logic.