Skip to content

fix(embedded-runner): preserve provider errors on cleanup takeover#84056

Closed
abnershang wants to merge 2 commits into
openclaw:mainfrom
abnershang:fix/embedded-runner-provider-error-preservation
Closed

fix(embedded-runner): preserve provider errors on cleanup takeover#84056
abnershang wants to merge 2 commits into
openclaw:mainfrom
abnershang:fix/embedded-runner-provider-error-preservation

Conversation

@abnershang

@abnershang abnershang commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Preserve the original prompt/provider error when embedded-attempt cleanup later reports EmbeddedAttemptSessionTakeoverError.
  • Keep cleanup-only takeover errors fatal when there is no prior prompt error to report.
  • Carry the takeover signal into fallback classification so the user-facing provider message remains visible but model fallback still aborts on local session takeover.
  • Add regressions for prompt-error preservation, cleanup-only takeover, and fallback abort behavior.

Real behavior proof

Behavior addressed: embedded runner cleanup can race with a session takeover after a provider prompt failure; the user-facing failure should remain the provider/prompt error, but fallback must still see the takeover signal and stop instead of dispatching another model attempt into a taken-over session.

Real environment tested: local OpenClaw source checkout in an isolated worktree on this branch, using the patched fallback classifier from the worktree root.

Exact steps or command run after the patch:

node --import tsx -e '<construct a provider-message wrapper with EmbeddedAttemptSessionTakeoverError identity/cause and pass it to isNonProviderRuntimeCoordinationError>'
node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/model-fallback.test.ts
pnpm tsgo:test:src
pnpm check:changed
git diff --check origin/main...HEAD

Evidence after fix: terminal output from the one-shot runtime fallback-classifier probe:

{
  "wrapperName": "EmbeddedAttemptSessionTakeoverError",
  "userFacingMessage": "provider rejected request: HTTP 400",
  "causeName": "EmbeddedAttemptSessionTakeoverError",
  "abortsFallback": true
}

Observed result after fix: the wrapper preserves the provider-facing message while retaining the takeover identity/cause, and the fallback classifier returns abortsFallback: true for that runtime-shaped error.

What was not tested: no live provider outage was forced against a production gateway. The runtime classifier probe exercises the non-mocked fallback boundary; the cleanup race itself is covered by the embedded-runner attempt harness that owns the cleanup/error precedence path.

Supplemental checks:

  • Focused Vitest lane passed: 4 files, 218 tests.
  • pnpm tsgo:test:src passed.
  • pnpm check:changed passed the core/coreTests lane, lint, typechecks, dependency guards, import-cycle guard, and related changed-file checks.
  • git diff --check origin/main...HEAD passed.

Review

Codex review initially flagged that preserving only the provider error could let fallback continue after session takeover. The second commit addresses that by carrying the takeover signal through fallback classification while keeping the provider message user-facing. ClawSweeper re-review now reports patch quality as acceptable; this body update uses the proof-field labels required by the repository gate.

AI-assisted: yes. I understand the change and verified the narrow error-precedence and fallback-abort behavior locally.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR updates embedded-runner cleanup error precedence and model fallback classification so provider prompt errors can stay user-facing while a cleanup session-takeover signal still aborts fallback.

Reproducibility: yes. at source level. Current main rejects cleanup takeover ahead of a prior prompt error, and the fallback classifier can normalize a provider-looking takeover wrapper unless the direct takeover identity is preserved.

PR rating
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Summary: Good normal PR quality: the implementation is narrow, the earlier fallback-signal concern is addressed, and the proof is adequate for an internal runtime boundary.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
✨ Hatched: 🥚 common Frosted Signal Puff

        .--^^^^--.           
     .-'  o    o  '-.        
    /       \__/      \      
   |    /\  ____  /\   |     
   |   /  \/____\/  \  |     
    \  \_.------._/  /       
     '._  `----'  _.'        
        '-.____.-'           
       _/|_|  |_|\_          
      /__|      |__\         
       .-----------.         
      '-------------'        

Rarity: 🥚 common.
Trait: watches the merge queue.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Frosted Signal Puff in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • How to hatch it: reach status: 👀 ready for maintainer look or status: 🚀 automerge armed; that usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Real behavior proof
Sufficient (terminal): The PR body includes structured after-fix terminal output from a local runtime classifier probe plus focused test/check commands for the changed internal boundary.

Risk before merge
Why this matters: - Merging changes fallback classification so takeover-named provider-message wrappers abort fallback even when the visible provider message would otherwise be failoverable; that is intentional but affects provider/model routing behavior.

  • The proof is terminal/runtime-boundary evidence plus focused harness tests, not a forced live provider outage with an actual cleanup-race log.

Maintainer options:

  1. Accept boundary proof after CI (recommended)
    Treat the runtime classifier probe and focused embedded-runner/fallback regressions as sufficient for this internal race, then merge after required checks stay green.
  2. Request live race evidence
    Ask for a redacted runtime log or terminal artifact from an actual cleanup takeover after provider failure before merging.

Next step before merge
No automated repair is needed; a maintainer should accept the fallback/session-state tradeoff and verify required checks on the PR head.

Security
Cleared: The diff touches TypeScript agent runtime code and tests only; I found no concrete dependency, CI, secret-handling, package, or supply-chain regression.

Review details

Best possible solution:

Land the narrow error-precedence and fallback-abort fix after required checks if maintainers accept boundary-level proof; ask for live cleanup-race logs only if that proof bar is needed for this session-state path.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main rejects cleanup takeover ahead of a prior prompt error, and the fallback classifier can normalize a provider-looking takeover wrapper unless the direct takeover identity is preserved.

Is this the best way to solve the issue?

Yes. The proposed fix keeps the provider message user-facing while carrying a takeover identity/cause through fallback, with focused tests for prompt-error preservation, cleanup-only takeover, and fallback abort behavior.

Label justifications:

  • P2: This is a focused bug fix for an agent runtime/fallback edge case with limited blast radius but real session/provider impact.
  • merge-risk: 🚨 auth-provider: The patch changes when provider/model fallback aborts instead of trying configured fallback candidates.
  • merge-risk: 🚨 session-state: The patch preserves and acts on embedded session takeover state across cleanup and fallback boundaries.

Acceptance criteria:

  • node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/model-fallback.test.ts
  • pnpm tsgo:test:src
  • pnpm check:changed
  • git diff --check origin/main...HEAD

What I checked:

  • Current cleanup precedence drops the provider error: Current main emits and rejects cleanupError ?? promptError, so a cleanup-time session takeover can replace an earlier provider prompt failure before the caller sees it. (src/agents/pi-embedded-runner/run/attempt.ts:4839, f07c87405c30)
  • Current fallback classifier can miss takeover wrappers with provider-looking messages: Current main only treats takeover-bearing errors as local coordination failures when failover classification is null, so a takeover wrapper whose visible message looks like a rate limit can still be normalized as provider failover. (src/agents/failover-error.ts:270, f07c87405c30)
  • Existing fallback intent is to abort on local coordination errors: The existing fallback loop already documents and enforces that local runtime coordination errors should throw instead of consuming fallback candidates. (src/agents/model-fallback.ts:1200, f07c87405c30)
  • PR diff adds the narrow preservation and abort path: The diff adds a provider-message wrapper named EmbeddedAttemptSessionTakeoverError, passes the original provider error to diagnostics, throws the wrapper on cleanup takeover after prompt failure, and checks the coordination error before failover normalization. (src/agents/pi-embedded-runner/run/attempt.ts:990, 3240d6764653)
  • PR diff adds focused regressions: The diff adds embedded-runner tests for prompt-error preservation and cleanup-only takeover, plus a model-fallback test proving takeover-carrying provider errors abort after one attempt. (src/agents/model-fallback.test.ts:797, 3240d6764653)
  • Area history and provenance: Local blame attributes the current cleanup precedence and fallback coordination classifier to the imported current tree at 1a242cd4f5011b0364fba3af15701cbe99096e32; recent adjacent work touched the same embedded-runner attempt path in ddeaebfc6807aa18ef90b14851a3646f3f0b3454 and cc835b6d72765ff81aeab7d8ad8628b425c24547. (src/agents/pi-embedded-runner/run/attempt.ts:4804, 1a242cd4f501)

Likely related people:

  • joshavant: Recent merged work touched src/agents/pi-embedded-runner/run/attempt.ts and the same context-engine test file in the embedded-runner area. (role: recent adjacent contributor; confidence: medium; commits: cc835b6d7276; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts)
  • Galin Iliev: Recent merged cleanup diagnostics work touched the embedded-runner attempt cleanup path and related cleanup timeout surfaces. (role: recent adjacent contributor; confidence: medium; commits: ddeaebfc6807; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/run-cleanup-timeout.ts, src/agents/queued-file-writer.ts)
  • clawsweeper[bot]: Local shallow blame attributes the current fallback coordination classifier and cleanup precedence block to the imported current tree; this is useful provenance but weak for human ownership. (role: current imported implementation carrier; confidence: low; commits: 1a242cd4f501; files: src/agents/failover-error.ts, src/agents/model-fallback.ts, src/agents/pi-embedded-runner/run/attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against f07c87405c30.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82fb23685a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +4870 to +4877
if (shouldPreservePromptError) {
log.warn(
`embedded attempt cleanup detected session takeover after prompt failure; preserving prompt error: ` +
`runId=${params.runId} sessionId=${params.sessionId} ` +
`promptError=${formatErrorMessage(promptError)} cleanupError=${formatErrorMessage(cleanupError)}`,
);
} else {
await Promise.reject(cleanupError);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve takeover signal for fallback

When a provider prompt failure is classified for fallback (for example rate limits or no endpoints) and cleanup also detects EmbeddedAttemptSessionTakeoverError, this branch only logs and lets the attempt surface the provider error. The outer runWithModelFallback path explicitly aborts fallback on session takeover (src/agents/model-fallback.ts:1200-1207); after this change it only sees the provider error and can dispatch another model attempt into a session that has been taken over. Keep the user-facing provider error, but carry/throw the takeover signal so fallback cannot continue in that state.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. labels May 19, 2026
@abnershang

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@abnershang

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@abnershang

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🔧
ClawSweeper automerge is enabled.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-19 18:57:15 UTC review queued 3240d6764653 (queued)

@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper 🐠 reef update

Thanks for the work here. ClawSweeper could not write to the source branch, so it opened a replacement PR rather than letting the fix drift. attribution still points back here.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #84321
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
This closeout is intentional for this run: the replacement PR is now the active review lane.
Attribution stays attached; the replacement just gives the fix a writable branch.
Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against e7d9d8c.

@clawsweeper clawsweeper Bot closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants