Skip to content

perf(tests): refactor embedded attempt runner helpers#87410

Merged
RomneyDa merged 5 commits into
mainfrom
codex/cleanup-embedded-attempt-runner
May 28, 2026
Merged

perf(tests): refactor embedded attempt runner helpers#87410
RomneyDa merged 5 commits into
mainfrom
codex/cleanup-embedded-attempt-runner

Conversation

@RomneyDa

@RomneyDa RomneyDa commented May 27, 2026

Copy link
Copy Markdown
Member

Motivation for this is test performance:

  • before: 12.7s and 1.50GB max RSS
  • after: 3.84s, 1.05GB max RSS

Summary

This PR is mainly a structural cleanup for the embedded attempt runner: it extracts pure helper decisions out of the 5k-line run/attempt.ts orchestration hotspot so those decisions can be tested without importing the full runner. It is not intended to delete behavior or change user-visible runtime behavior.

  • Extract bootstrap-context, LLM-boundary message cleanup, queued steering, run-decision, and Tool Search run-planning helpers into focused production modules.
  • Update run/attempt.ts to consume those helpers directly, without compatibility shims.
  • Move helper coverage out of the broad run/attempt.test.ts surface into focused helper tests.
  • Restore focused coverage for LLM-boundary metadata/runtime-context sanitization and the Tool Search bad-allowlist/client-tool negative case.

Test Cost / Size Impact

  • run/attempt.ts: 5339 -> 4755 LOC.
  • run/attempt.test.ts: 4095 -> 3480 LOC.
  • Local isolated measurement only: baseline node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts reported Vitest duration 12.72s, max RSS about 1.50 GB.
  • After extraction, the same local isolated command reported Vitest duration 3.84s on the latest run. A prior post-change measured run showed max RSS about 1.05 GB.
  • Current CI impact is limited to jobs that run embedded-agent runner tests, primarily checks-node-agentic-agents via test/vitest/vitest.agents-embedded-agent.config.ts. This is not a measured 8-10s reduction across all PR Actions; total PR wall-clock impact depends on whether that shard is on the critical path and will likely be small/noisy.

The net diff is modest because the helper logic still exists; the win is that runner orchestration and helper tests are no longer coupled through the full attempt.ts module.

Verification

  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts
  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run.incomplete-turn.test.ts
  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner-extraparams.test.ts
  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/model.test.ts
  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts src/agents/embedded-agent-runner/run.incomplete-turn.test.ts src/agents/embedded-agent-runner-extraparams.test.ts src/agents/embedded-agent-runner/run/attempt.run-decisions.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/run/attempt.bootstrap-context.test.ts src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.queue-message.test.ts src/agents/embedded-agent-runner/model.test.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.test.src.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/test-src.tsbuildinfo
  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo
  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.json src/agents/embedded-agent-runner/run/attempt.llm-boundary.ts src/agents/embedded-agent-runner/run/attempt.llm-boundary.test.ts src/agents/embedded-agent-runner/run/attempt.tool-search-run-plan.test.ts src/agents/embedded-agent-runner/model.provider-runtime.test-support.ts
  • pnpm protocol:gen && pnpm protocol:gen:swift
  • git diff --check
  • .agents/skills/autoreview/scripts/autoreview --mode local

Real behavior proof

Behavior addressed: Embedded attempt runner helper extraction and test slimming; no user-visible runtime behavior change intended.
Real environment tested: Local Codex worktree on macOS with repo Node/Vitest wrappers.
Exact steps or command run after this patch: The Verification commands listed above.
Evidence after fix: Grouped Vitest validation passed 14 files and 949 tests; focused ClawSweeper/CI fix validation passed 4 files and 211 tests; tsgo prod/test-source passes exited clean; focused oxlint passed; protocol generation produced the checked-in Swift protocol output; autoreview reported no accepted/actionable findings.
Observed result after fix: Runner helper decisions are covered by focused tests, LLM-boundary metadata/runtime-context sanitization coverage is restored, Tool Search bad-allowlist/client-tool coverage is restored, and run/attempt.test.ts no longer imports run/attempt.ts as a broad helper barrel.
What was not tested: Full pnpm check / full suite and live provider behavior; this is a scoped refactor of existing helper logic.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XL maintainer Maintainer-authored PR labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 11:03 PM ET / 03:03 UTC.

Summary
The PR extracts embedded attempt-runner helper decisions into focused modules and moves the matching coverage out of the broad runner test file.

PR surface: Source +66, Tests +95. Total +161 across 12 files.

Reproducibility: not applicable. this is a performance/refactor PR rather than a bug report. The PR body provides before/after local wrapper measurements and focused test results for the changed helper surfaces.

Review metrics: 1 noteworthy metric.

  • Reported focused test cost: 12.72s/1.50GB -> 3.84s/1.05GB. The PR’s stated benefit is lower cost for the embedded attempt runner hotspot, so the reported before/after timing and RSS are useful maintainer context.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] none.

Risk before merge

  • [P1] The PR body explicitly leaves full pnpm check, full suite, and live provider behavior untested; the scoped helper and runner tests may be enough for this internal refactor, but that is still a maintainer gate.

Maintainer options:

  1. Decide the mitigation before merge
    Land the extraction if maintainers accept the scoped proof, keeping the production runner wired to the extracted helpers and preserving runtime behavior.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] No automated repair is needed because there are no actionable findings; maintainer review and normal CI remain the gates.

Security
Cleared: The diff is limited to TypeScript runner/helper/test code and adds LLM-boundary metadata sanitization coverage; no dependency, workflow, secret, or package-supply-chain concern was found.

Review details

Best possible solution:

Land the extraction if maintainers accept the scoped proof, keeping the production runner wired to the extracted helpers and preserving runtime behavior.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this is a performance/refactor PR rather than a bug report. The PR body provides before/after local wrapper measurements and focused test results for the changed helper surfaces.

Is this the best way to solve the issue?

Yes; the scoped runner policy favors moving helper behavior into production helpers with focused tests, and the diff avoids a test-only copy while keeping the runner call sites on the same decisions.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against bf22893cb6e9.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-change terminal proof for the scoped helper extraction and test-slimming behavior.

Label justifications:

  • P3: This is a low-risk internal performance/refactor PR with focused tests and no user-facing behavior change intended.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes structured after-change terminal proof for the scoped helper extraction and test-slimming behavior.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-change terminal proof for the scoped helper extraction and test-slimming behavior.
Evidence reviewed

PR surface:

Source +66, Tests +95. Total +161 across 12 files.

View PR surface stats
Area Files Added Removed Net
Source 6 810 744 +66
Tests 6 733 638 +95
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 12 1543 1382 +161

What I checked:

Likely related people:

  • Vincent Koc: Blame on the current inline helper blocks being extracted points heavily to commit 53ad531 in the embedded runner path. (role: recent area contributor; confidence: high; commits: 53ad531df918; files: src/agents/embedded-agent-runner/run/attempt.ts)
  • Alix-007: Recent prompt-local hook context work owns the model prompt transform helper body that this PR moves into attempt.llm-boundary.ts. (role: adjacent helper contributor; confidence: medium; commits: 8b7a4826a1ff; files: src/agents/embedded-agent-runner/run/attempt.ts)
  • Chunyue Wang: Recent history on attempt.ts includes session-lock timeout work adjacent to the extracted run-decision/session-lock helper surface. (role: recent adjacent contributor; confidence: medium; commits: 65fb56513fb2; files: src/agents/embedded-agent-runner/run/attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Mossy Lint Imp

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: guards the happy path.
Image traits: location diff observatory; accessory rollback rope; palette seafoam, black, and opal; mood proud; pose stepping out of a freshly hatched shell; shell paper lantern shell; lighting warm desk-lamp glow; background tiny artifact crates.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Mossy Lint Imp in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 27, 2026
@RomneyDa RomneyDa force-pushed the codex/cleanup-embedded-attempt-runner branch from 5f85619 to ef00cf4 Compare May 27, 2026 21:45
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 27, 2026
@RomneyDa RomneyDa force-pushed the codex/cleanup-embedded-attempt-runner branch from 9702dc0 to e296862 Compare May 28, 2026 02:14
@RomneyDa RomneyDa changed the title Refactor embedded attempt runner helpers perf(tests): refactor embedded attempt runner helpers May 28, 2026
@RomneyDa

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@RomneyDa RomneyDa merged commit d165100 into main May 28, 2026
129 of 133 checks passed
@RomneyDa RomneyDa deleted the codex/cleanup-embedded-attempt-runner branch May 28, 2026 03:04
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 28, 2026
* refactor: extract embedded attempt runner helpers

* fix: remove unused attempt queue type import

* fix: restore attempt helper coverage

* fix: clear attempt cleanup ci

* fix: restore model prompt transform extraction
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* refactor: extract embedded attempt runner helpers

* fix: remove unused attempt queue type import

* fix: restore attempt helper coverage

* fix: clear attempt cleanup ci

* fix: restore model prompt transform extraction
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* refactor: extract embedded attempt runner helpers

* fix: remove unused attempt queue type import

* fix: restore attempt helper coverage

* fix: clear attempt cleanup ci

* fix: restore model prompt transform extraction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: XL status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant