Skip to content

feat(agents): add harness runtime prewarm hook#86930

Closed
vincentkoc wants to merge 1 commit into
mainfrom
fix/harness-runtime-prewarm
Closed

feat(agents): add harness runtime prewarm hook#86930
vincentkoc wants to merge 1 commit into
mainfrom
fix/harness-runtime-prewarm

Conversation

@vincentkoc

@vincentkoc vincentkoc commented May 26, 2026

Copy link
Copy Markdown
Member

Summary

  • Add an optional generic AgentHarness.prewarm contract plus plugin-sdk exports.
  • Add prewarmAgentHarnessRuntime() to select and warm the active plugin harness without starting a turn.
  • Implement Codex harness prewarm through the shared app-server client, using configured app-server options and session-bound auth profile bindings.

Verification

  • git diff --check origin/main
  • node scripts/test-projects-serial.mjs src/agents/harness/prewarm.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/index.test.ts

Scope

This PR is now only the reusable harness prewarm layer. The TUI startup behavior was split into a stacked draft PR so the review surface stays small.

What was deliberately removed from this PR: the chat.prewarm_agent_runtime gateway RPC/protocol surface and gateway client tests.

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime agents Agent runtime and tooling extensions: codex size: L maintainer Maintainer-authored PR labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge. Reviewed May 26, 2026, 1:27 PM ET / 17:27 UTC.

Summary
The PR adds an optional agent-harness runtime prewarm hook, exports its Plugin SDK types, and implements Codex app-server prewarming with focused tests.

PR surface: Source +121, Tests +115. Total +236 across 6 files.

Reproducibility: yes. for the blocking SDK-baseline issue: source inspection shows new public SDK exports while the tracked baseline hash is unchanged. Runtime behavior still lacks real-environment proof.

Review metrics: 1 noteworthy metric.

  • Public harness API surface: 1 optional hook added, 3 exported type aliases added. This changes the Plugin SDK contract, so maintainers need the baseline/docs/proof aligned before merge.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Refresh and commit the Plugin SDK API baseline hash.
  • Document the harness prewarm lifecycle, params, auth/session expectations, and failure behavior.
  • Add redacted real setup proof that Codex prewarm starts or reuses the app-server without starting a turn.

Proof guidance:
Needs real behavior proof before merge: The PR body lists local test commands only; add redacted terminal output, logs, or an artifact showing the after-fix Codex prewarm path, then update the PR body to trigger re-review or ask a maintainer for @clawsweeper re-review.

Risk before merge

  • Merging the new AgentHarness.prewarm public contract without refreshed SDK baseline artifacts and docs would leave plugin authors and CI with a mismatched Plugin SDK surface.
  • Codex prewarm starts or reuses the app-server with a resolved auth profile before a turn, so profile-binding mistakes could affect auth-provider behavior for existing sessions.
  • The user-visible TUI consumer is split into a stacked draft PR, so this base API should be settled before the stack depends on it.

Maintainer options:

  1. Finish the SDK contract before merge (recommended)
    Refresh the Plugin SDK API baseline, document the prewarm lifecycle and parameters, and add the proof needed for reviewers to accept the new public hook.
  2. Accept the hook as maintainer-owned experimental API
    Maintainers can intentionally accept the compatibility risk, but should still record why the undocumented experimental surface and stale API hash are acceptable.
  3. Keep the TUI consumer stacked
    Pause the stacked TUI startup PR until this base harness prewarm contract is approved or replaced.

Next step before merge
This maintainer draft changes a public Plugin SDK/auth runtime seam and needs human API/proof review before any automation should take over.

Security
Cleared: No concrete security or supply-chain regression was found; the auth-sensitive path reuses existing Codex app-server auth/profile helpers and adds no dependencies or workflow execution.

Review findings

  • [P2] Refresh the SDK API baseline — src/plugin-sdk/agent-harness-runtime.ts:32-34
  • [P2] Document the prewarm harness contract — src/agents/harness/types.ts:98
Review details

Best possible solution:

Land the narrowed reusable hook only after the SDK baseline hash, public harness docs, focused tests, and redacted real Codex prewarm proof are in the same stack-ready state.

Do we have a high-confidence way to reproduce the issue?

Yes for the blocking SDK-baseline issue: source inspection shows new public SDK exports while the tracked baseline hash is unchanged. Runtime behavior still lacks real-environment proof.

Is this the best way to solve the issue?

Unclear: the narrowed reusable hook is a plausible owner-boundary shape, but it is not merge-ready until the public SDK contract artifacts/docs and Codex prewarm proof are added.

Full review comments:

  • [P2] Refresh the SDK API baseline — src/plugin-sdk/agent-harness-runtime.ts:32-34
    This adds new public exports from openclaw/plugin-sdk/agent-harness-runtime, but docs/.generated/plugin-sdk-api-baseline.sha256 is unchanged from base. The repository's plugin-sdk:api:check gate compares that hash, so this branch will drift until the generated API baseline hash is refreshed and committed.
    Confidence: 0.93
  • [P2] Document the prewarm harness contract — src/agents/harness/types.ts:98
    AgentHarness.prewarm is a new public trusted-plugin hook, but the agent harness docs still describe only selection/run behavior and do not define when prewarm is called, which params are available, auth/session binding expectations, or failure semantics. Please update the public harness docs alongside the SDK change so plugin authors do not infer this contract from core internals.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a5eee8f1c678.

Label changes

Label changes:

  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority feature/API improvement with bounded blast radius but real SDK and auth-provider review requirements.
  • merge-risk: 🚨 compatibility: The PR expands the public Plugin SDK AgentHarness contract and leaves the generated API baseline hash unchanged.
  • merge-risk: 🚨 auth-provider: The Codex prewarm path resolves session-bound auth profiles and starts/reuses the app-server before the first turn.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists local test commands only; add redacted terminal output, logs, or an artifact showing the after-fix Codex prewarm path, then update the PR body to trigger re-review or ask a maintainer for @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +121, Tests +115. Total +236 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 4 121 0 +121
Tests 2 116 1 +115
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 237 1 +236

What I checked:

Likely related people:

  • steipete: Introduced the pluggable agent harness registry, Codex app-server harness, Codex harness selection narrowing, and the Plugin SDK API baseline hash mechanism touched by this PR. (role: feature owner; confidence: high; commits: 44ec4d05de4a, dd26e8c44d4e, 47c0ce5f8531; files: src/agents/harness/types.ts, extensions/codex/harness.ts, src/plugin-sdk/agent-harness-runtime.ts)
  • vincentkoc: Current-main blame for the central harness, Codex harness, and Plugin SDK runtime files points to a recent Vincent Koc commit, and the PR author is also iterating this stack. (role: recent area contributor; confidence: medium; commits: c38b5033e64b, 961a0f6a6e94; files: src/agents/harness/types.ts, extensions/codex/harness.ts, src/plugin-sdk/agent-harness-runtime.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@vincentkoc vincentkoc force-pushed the fix/harness-runtime-prewarm branch from 8ca2091 to 1d477df Compare May 26, 2026 14:30
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 26, 2026
@vincentkoc vincentkoc force-pushed the fix/harness-runtime-prewarm branch from 1d477df to 1036441 Compare May 26, 2026 17:19
@vincentkoc vincentkoc changed the title feat(agents): prewarm selected harness runtimes from TUI startup feat(agents): add harness runtime prewarm hook May 26, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: M and removed app: web-ui App: web-ui gateway Gateway runtime size: L labels May 26, 2026
@vincentkoc vincentkoc force-pushed the fix/harness-runtime-prewarm branch from 1036441 to 961a0f6 Compare May 26, 2026 17:20
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 26, 2026
@steipete

Copy link
Copy Markdown
Contributor

Not a fan of yet another API seam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants