perf(tests): refactor embedded attempt runner helpers by RomneyDa · Pull Request #87410 · openclaw/openclaw

RomneyDa · 2026-05-27T20:54:52Z

Motivation for this is test performance:

before: 12.7s and 1.50GB max RSS
after: 3.84s, 1.05GB max RSS

Summary

This PR is mainly a structural cleanup for the embedded attempt runner: it extracts pure helper decisions out of the 5k-line run/attempt.ts orchestration hotspot so those decisions can be tested without importing the full runner. It is not intended to delete behavior or change user-visible runtime behavior.

Extract bootstrap-context, LLM-boundary message cleanup, queued steering, run-decision, and Tool Search run-planning helpers into focused production modules.
Update run/attempt.ts to consume those helpers directly, without compatibility shims.
Move helper coverage out of the broad run/attempt.test.ts surface into focused helper tests.
Restore focused coverage for LLM-boundary metadata/runtime-context sanitization and the Tool Search bad-allowlist/client-tool negative case.

Test Cost / Size Impact

run/attempt.ts: 5339 -> 4755 LOC.
run/attempt.test.ts: 4095 -> 3480 LOC.
Local isolated measurement only: baseline node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts reported Vitest duration 12.72s, max RSS about 1.50 GB.
After extraction, the same local isolated command reported Vitest duration 3.84s on the latest run. A prior post-change measured run showed max RSS about 1.05 GB.
Current CI impact is limited to jobs that run embedded-agent runner tests, primarily checks-node-agentic-agents via test/vitest/vitest.agents-embedded-agent.config.ts. This is not a measured 8-10s reduction across all PR Actions; total PR wall-clock impact depends on whether that shard is on the critical path and will likely be small/noisy.

The net diff is modest because the helper logic still exists; the win is that runner orchestration and helper tests are no longer coupled through the full attempt.ts module.

Verification

node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run.incomplete-turn.test.ts
node scripts/run-vitest.mjs src/agents/embedded-agent-runner-extraparams.test.ts
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/model.test.ts
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts src/agents/embedded-agent-runner/run.incomplete-turn.test.ts src/agents/embedded-agent-runner-extraparams.test.ts src/agents/embedded-agent-runner/run/attempt.run-decisions.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/run/attempt.bootstrap-context.test.ts src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.queue-message.test.ts src/agents/embedded-agent-runner/model.test.ts
node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.test.src.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/test-src.tsbuildinfo
node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo
node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.json src/agents/embedded-agent-runner/run/attempt.llm-boundary.ts src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/model.provider-runtime.test-support.ts
pnpm protocol:gen && pnpm protocol:gen:swift
git diff --check
.agents/skills/autoreview/scripts/autoreview --mode local

Real behavior proof

Behavior addressed: Embedded attempt runner helper extraction and test slimming; no user-visible runtime behavior change intended.
Real environment tested: Local Codex worktree on macOS with repo Node/Vitest wrappers.
Exact steps or command run after this patch: The Verification commands listed above.
Evidence after fix: Grouped Vitest validation passed 14 files and 949 tests; focused ClawSweeper/CI fix validation passed 4 files and 211 tests; tsgo prod/test-source passes exited clean; focused oxlint passed; protocol generation produced the checked-in Swift protocol output; autoreview reported no accepted/actionable findings.
Observed result after fix: Runner helper decisions are covered by focused tests, LLM-boundary metadata/runtime-context sanitization coverage is restored, Tool Search bad-allowlist/client-tool coverage is restored, and run/attempt.test.ts no longer imports run/attempt.ts as a broad helper barrel.
What was not tested: Full pnpm check / full suite and live provider behavior; this is a scoped refactor of existing helper logic.

clawsweeper · 2026-05-27T20:56:23Z

Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 11:03 PM ET / 03:03 UTC.

Summary
The PR extracts embedded attempt-runner helper decisions into focused modules and moves the matching coverage out of the broad runner test file.

PR surface: Source +66, Tests +95. Total +161 across 12 files.

Reproducibility: not applicable. this is a performance/refactor PR rather than a bug report. The PR body provides before/after local wrapper measurements and focused test results for the changed helper surfaces.

Review metrics: 1 noteworthy metric.

Reported focused test cost: 12.72s/1.50GB -> 3.84s/1.05GB. The PR’s stated benefit is lower cost for the embedded attempt runner hotspot, so the reported before/after timing and RSS are useful maintainer context.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] none.

Risk before merge

[P1] The PR body explicitly leaves full pnpm check, full suite, and live provider behavior untested; the scoped helper and runner tests may be enough for this internal refactor, but that is still a maintainer gate.

Maintainer options:

Decide the mitigation before merge
Land the extraction if maintainers accept the scoped proof, keeping the production runner wired to the extracted helpers and preserving runtime behavior.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P2] No automated repair is needed because there are no actionable findings; maintainer review and normal CI remain the gates.

Security
Cleared: The diff is limited to TypeScript runner/helper/test code and adds LLM-boundary metadata sanitization coverage; no dependency, workflow, secret, or package-supply-chain concern was found.

Review details

Best possible solution:

Land the extraction if maintainers accept the scoped proof, keeping the production runner wired to the extracted helpers and preserving runtime behavior.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this is a performance/refactor PR rather than a bug report. The PR body provides before/after local wrapper measurements and focused test results for the changed helper surfaces.

Is this the best way to solve the issue?

Yes; the scoped runner policy favors moving helper behavior into production helpers with focused tests, and the diff avoids a test-only copy while keeping the runner call sites on the same decisions.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against bf22893cb6e9.

Label changes

Label changes:

add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-change terminal proof for the scoped helper extraction and test-slimming behavior.

Label justifications:

P3: This is a low-risk internal performance/refactor PR with focused tests and no user-facing behavior change intended.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes structured after-change terminal proof for the scoped helper extraction and test-slimming behavior.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-change terminal proof for the scoped helper extraction and test-slimming behavior.

Evidence reviewed

PR surface:

Source +66, Tests +95. Total +161 across 12 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	6	810	744	+66
Tests	6	733	638	+95
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	12	1543	1382	+161

What I checked:

Protected item context: The provided GitHub context shows author association MEMBER and labels including maintainer, so this item must stay open for explicit maintainer handling.
Current main is not already refactored: Current main still defines normalizeMessagesForLlmBoundary inside the large runner file, so the central extraction is not implemented on main. (src/agents/embedded-agent-runner/run/attempt.ts:967, bf22893cb6e9)
PR source extraction: The PR head imports extracted helper modules from attempt.bootstrap-context, attempt.llm-boundary, attempt.queue-message, attempt.run-decisions, and attempt.tool-search-run-plan and uses them at the existing runner call sites. (src/agents/embedded-agent-runner/run/attempt.ts:314, e296862940c1)
Focused helper coverage: The PR adds focused LLM-boundary coverage, including the blocked-metadata sanitizer case that keeps safe fields while excluding unsafe reason/prompt data. (src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts:296, e296862940c1)
Scoped policy fit: The scoped runner AGENTS.md explicitly prefers focused helper tests for embedded runner proof and full-runner tests only when the behavior needs the full runner. (src/agents/embedded-agent-runner/run/AGENTS.md:7, bf22893cb6e9)
History provenance: git blame ties most current inline helper logic being extracted, including Tool Search planning, queue steering, runtime context, and run-decision helpers, to commit 53ad531 before the PR extraction. (src/agents/embedded-agent-runner/run/attempt.ts:500, 53ad531df918)

Likely related people:

Vincent Koc: Blame on the current inline helper blocks being extracted points heavily to commit 53ad531 in the embedded runner path. (role: recent area contributor; confidence: high; commits: 53ad531df918; files: src/agents/embedded-agent-runner/run/attempt.ts)
Alix-007: Recent prompt-local hook context work owns the model prompt transform helper body that this PR moves into attempt.llm-boundary.ts. (role: adjacent helper contributor; confidence: medium; commits: 8b7a4826a1ff; files: src/agents/embedded-agent-runner/run/attempt.ts)
Chunyue Wang: Recent history on attempt.ts includes session-lock timeout work adjacent to the extracted run-decision/session-lock helper surface. (role: recent adjacent contributor; confidence: medium; commits: 65fb56513fb2; files: src/agents/embedded-agent-runner/run/attempt.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

clawsweeper · 2026-05-27T21:00:48Z

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Mossy Lint Imp

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

Merged PRs are hatchable.
Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: guards the happy path.
Image traits: location diff observatory; accessory rollback rope; palette seafoam, black, and opal; mood proud; pose stepping out of a freshly hatched shell; shell paper lantern shell; lighting warm desk-lamp glow; background tiny artifact crates.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Mossy Lint Imp in ClawSweeper.

What is this egg doing here?

Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

RomneyDa · 2026-05-28T02:47:59Z

@clawsweeper re-review

clawsweeper · 2026-05-28T02:48:02Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Superseded
Detail: A newer re-review for this item started before this run finished, so GitHub cancelled this older run. Check the latest ClawSweeper run for the current result.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26551562654
Updated: 2026-05-28T02:55:55.244Z

* refactor: extract embedded attempt runner helpers * fix: remove unused attempt queue type import * fix: restore attempt helper coverage * fix: clear attempt cleanup ci * fix: restore model prompt transform extraction

openclaw-barnacle Bot added agents Agent runtime and tooling size: XL maintainer Maintainer-authored PR labels May 27, 2026

RomneyDa force-pushed the codex/cleanup-embedded-attempt-runner branch from 5f85619 to ef00cf4 Compare May 27, 2026 21:45

RomneyDa added 5 commits May 27, 2026 19:04

refactor: extract embedded attempt runner helpers

07b23df

fix: remove unused attempt queue type import

e649a84

fix: restore attempt helper coverage

af86201

fix: clear attempt cleanup ci

d76d23d

fix: restore model prompt transform extraction

e296862

RomneyDa force-pushed the codex/cleanup-embedded-attempt-runner branch from 9702dc0 to e296862 Compare May 28, 2026 02:14

RomneyDa changed the title ~~Refactor embedded attempt runner helpers~~ perf(tests): refactor embedded attempt runner helpers May 28, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026

RomneyDa merged commit d165100 into main May 28, 2026
129 of 133 checks passed

RomneyDa deleted the codex/cleanup-embedded-attempt-runner branch May 28, 2026 03:04

RomneyDa mentioned this pull request May 28, 2026

test(cron): speed up isolated fallback tests #87520

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(tests): refactor embedded attempt runner helpers#87410

perf(tests): refactor embedded attempt runner helpers#87410
RomneyDa merged 5 commits into
mainfrom
codex/cleanup-embedded-attempt-runner

RomneyDa commented May 27, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 27, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 27, 2026 •

edited

Loading

Uh oh!

RomneyDa commented May 28, 2026

Uh oh!

clawsweeper Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

RomneyDa commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Cost / Size Impact

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hatch command

Uh oh!

RomneyDa commented May 28, 2026

Uh oh!

clawsweeper Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RomneyDa commented May 27, 2026 •

edited

Loading

clawsweeper Bot commented May 27, 2026 •

edited

Loading

clawsweeper Bot commented May 27, 2026 •

edited

Loading

clawsweeper Bot commented May 28, 2026 •

edited

Loading