fix #86077: keep fallback errors candidate scoped by zhangguiping-xydt · Pull Request #86134 · openclaw/openclaw

zhangguiping-xydt · 2026-05-24T17:05:08Z

Summary

Problem: later fallback candidates that timed out without a current assistant could surface a previous provider's raw error text, making provider triage misleading.
Solution: scope reused session assistant errors to the current candidate on timeout and only prefer raw fallback errors when their provider/model attribution matches the candidate.
What changed: updated embedded-runner assistant selection, model-fallback attempt error selection, and added focused regressions for both paths.
What did NOT change: no provider routing, auth refresh, profile rotation, external API payloads, or fallback candidate ordering changed.

Real behavior proof

Behavior or issue addressed: stale prior-provider quota text is no longer surfaced for a later candidate timeout.
Real environment tested: Linux 4.19.112-2.el8.x86_64, Node v24.16.0, OpenClaw v2026.5.24, worktree SHA 1a6d54

Exact steps or command run after this patch:

OPENCLAW_SKIP_CHANNELS=1 pnpm gateway:dev
node --import tsx /media/vdc/code/ai/aispace/openclaw-issue-86077-evidence/proof-after.mts
node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts src/agents/model-fallback.test.ts
pnpm check:test-types

Evidence after fix:

$ OPENCLAW_SKIP_CHANNELS=1 node scripts/run-node.mjs --dev gateway
[openclaw] Building TypeScript (dist is stale: config_newer - config newer than build stamp).
runtime-postbuild: bundled plugin metadata completed in 352ms
2026-05-25T01:13:33.187+08:00 [gateway] loading configuration…
2026-05-25T01:13:33.356+08:00 [gateway] resolving authentication…
2026-05-25T01:13:33.388+08:00 [gateway] starting...
2026-05-25T01:13:40.050+08:00 [gateway] starting HTTP server...
2026-05-25T01:13:41.305+08:00 [gateway] agent model: openai/gpt-5.5 (thinking=medium, fast=off)
2026-05-25T01:13:41.308+08:00 [gateway] http server listening (8 plugins: acpx, browser, canvas, device-pair, file-transfer, memory-core, phone-control, talk-voice; 7.9s)
2026-05-25T01:13:41.322+08:00 [gateway/channels] skipping channel start (OPENCLAW_SKIP_CHANNELS=1 or OPENCLAW_SKIP_PROVIDERS=1)
2026-05-25T01:13:44.326+08:00 [gateway] ready
2026-05-25T01:13:44.349+08:00 [heartbeat] started

[model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/opus-4-7 candidate=anthropic/opus-4-7 reason=timeout next=google/gemini-3.1-pro-preview detail=LLM request timed out.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/opus-4-7 candidate=google/gemini-3.1-pro-preview reason=timeout next=none detail=LLM request timed out.
{
  "candidate": "anthropic/opus-4-7",
  "reason": "timeout",
  "surfacedError": "LLM request timed out.",
  "staleOpenAITextSuppressed": true
}

Test Files  4 passed (4)
Tests  140 passed (140)

Observed result after fix: OpenClaw started successfully in gateway dev mode, and the candidate failure detail remained the timeout message; the stale OpenAI quota text was suppressed instead of being attributed to the later candidate.
What was not tested: external provider outage calls were not made; the runtime started locally with channels skipped, and the patched fallback attribution logic was exercised through a direct local function proof.

Regression Test Plan

Coverage level: Unit test
Target test file: src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts; src/agents/model-fallback.test.ts
Scenario locked in: a later candidate timeout with no current assistant does not reuse the previous provider's session assistant, and model-fallback summaries ignore raw errors whose provider/model does not match the candidate.
Why this is the smallest reliable guardrail: it exercises the exact attribution boundary without needing real provider failures or slow timeout cascades.

Root Cause

Root cause: sessionLastAssistant was reused when currentAttemptAssistant was absent, even when the last assistant belonged to a previous provider/model; model-fallback then preferred that raw error text for the current candidate.
Missing detection / guardrail: no regression covered a later fallback candidate timeout with no current assistant and stale prior-provider session history.

clawsweeper · 2026-05-24T17:07:22Z

Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 7:16 AM ET / 11:16 UTC.

Summary
The PR scopes embedded-runner assistant error reuse and model-fallback raw-error summaries to the active fallback candidate, adds focused regressions, and updates the changelog.

PR surface: Source +39, Tests +161, Docs +1. Total +201 across 6 files.

Reproducibility: yes. Current main shows the source path: embedded runner can still use sessionLastAssistant when currentAttemptAssistant is absent, and model-fallback still prefers rawError over the candidate message.

Review metrics: 1 noteworthy metric.

Fallback Attribution Surface: 2 runtime paths changed, 2 test files updated, 1 changelog entry. Fallback diagnostics are compatibility-sensitive, so maintainers should notice both the embedded-runner and outer model-fallback attribution changes before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

none

Risk before merge

This intentionally changes fallback diagnostic output for existing multi-provider setups: mismatched prior-provider raw text is suppressed for later candidates, so maintainers should confirm the compatibility behavior before merge.

Maintainer options:

Confirm Scoped Diagnostics And Land (recommended)
Maintainers can accept the intentional diagnostics behavior change because the latest head preserves same-candidate and PI-stamped assistant errors while suppressing mismatched prior-provider raw text.
Ask For Broader Runtime Proof
If maintainers want stronger upgrade confidence, request a live or Crabbox multi-provider fallback run before merge rather than another code repair.

Next step before merge
No automated repair remains; maintainer review should confirm the compatibility-sensitive fallback diagnostics behavior before merge.

Security
Cleared: The diff touches TypeScript runtime fallback diagnostics, focused tests, and changelog only; I found no concrete security or supply-chain concern.

Review details

Best possible solution:

Land the candidate-scoped fallback attribution fix after maintainer confirmation that the diagnostics behavior change is acceptable and required checks remain green.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main shows the source path: embedded runner can still use sessionLastAssistant when currentAttemptAssistant is absent, and model-fallback still prefers rawError over the candidate message.

Is this the best way to solve the issue?

Yes. The latest head is a narrow maintainable fix because it scopes stale assistant/raw-error reuse to the active candidate while preserving same-candidate and PI-stamped compaction diagnostics.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0bb9b421f3e6.

Label changes

Label justifications:

P2: This is a normal-priority agent fallback diagnostics fix with limited blast radius and focused regression coverage.
merge-risk: 🚨 compatibility: The PR intentionally changes fallback error reporting semantics for existing multi-provider setups, which can affect operator diagnostics during upgrades.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes structured after-fix terminal/log proof from a local gateway dev run plus direct output showing stale prior-provider quota text suppressed.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-fix terminal/log proof from a local gateway dev run plus direct output showing stale prior-provider quota text suppressed.

Evidence reviewed

PR surface:

Source +39, Tests +161, Docs +1. Total +201 across 6 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	3	48	9	+39
Tests	2	168	7	+161
Docs	1	1	0	+1
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	6	217	16	+201

What I checked:

Repository policy applied: Root AGENTS.md and the scoped src/agents guides were read; root policy treats fallback behavior as compatibility-sensitive and the scoped guide supports focused helper/runner coverage for embedded-runner changes. (AGENTS.md:1, 0bb9b421f3e6)
Current main still has the stale assistant path: Current main still resolves error context from currentAttemptAssistant ?? sessionLastAssistant, matching the linked issue's stale shared-session assistant path before this PR. (src/agents/pi-embedded-runner/run.ts:1655, 0bb9b421f3e6)
Current main still prefers raw fallback errors: Current main records candidate failures with described.rawError ?? described.message, so a mismatched raw error can still win over the candidate-attributed message. (src/agents/model-fallback.ts:423, 0bb9b421f3e6)
PR scopes runner assistant reuse: At the PR head, sessionAssistantForCandidate is only reused when it matches the active provider/model and is then used for active error context, formatted assistant errors, assistantErrorText, and assistantForFailover. (src/agents/pi-embedded-runner/run.ts:1653, 36f648dcdc23)
PR preserves PI-stamped model semantics: The new isAssistantForModelRef helper resolves PI-stamped assistants through resolveReportedModelRef before comparing them to the current candidate, preserving the existing PI attribution contract. (src/agents/pi-embedded-runner/run/helpers.ts:104, 36f648dcdc23)
PR scopes outer fallback summaries: The new resolveCandidateAttemptError keeps rawError only when it has no conflicting provider/model attribution for the current candidate; otherwise it uses the candidate message. (src/agents/model-fallback.ts:403, 36f648dcdc23)

Likely related people:

vincentkoc: Current-main blame and GitHub path history show recent changes around the implicated runner and model-fallback lines, including fallback provider resolution work. (role: recent area contributor; confidence: high; commits: 73189e3ecb0c, 3c8d101f5a85, 2389b3bdd377; files: src/agents/pi-embedded-runner/run.ts, src/agents/model-fallback.ts, src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts)
steipete: GitHub path history shows recent work on model-fallback and fallback error-context tests that shape the candidate failure summary path touched here. (role: recent model-fallback contributor; confidence: medium; commits: cb5fdcf52cf8, 4c210e22fa97; files: src/agents/model-fallback.ts, src/agents/model-fallback.test.ts, src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts)
altaywtf: The PR timeline shows assignment and a maintainer re-review request, and the latest branch commits were authored to preserve same-candidate and PI-stamped fallback context. (role: PR follow-up owner; confidence: high; commits: 1aa308aae53c, b7994f1b88ce, e05db0c8dbfb; files: src/agents/pi-embedded-runner/run.ts, src/agents/pi-embedded-runner/run/helpers.ts, src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

clawsweeper · 2026-05-24T17:25:29Z

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Brave Diff Drake

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

Merged PRs are hatchable.
Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: sleeps inside passing CI.
Image traits: location proof lagoon; accessory tiny test log scroll; palette moss green and polished brass; mood focused; pose waving from a small platform; shell frosted glass shell; lighting bright celebratory glints; background smooth stones and checkmarks.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Brave Diff Drake in ClawSweeper.

What is this egg doing here?

Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

altaywtf · 2026-05-25T11:11:37Z

@clawsweeper re-review

clawsweeper · 2026-05-25T11:11:40Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26397530278
Updated: 2026-05-25T11:20:41.861Z

altaywtf · 2026-05-25T13:32:48Z

Merged via squash.

Prepared head SHA: 73cc076761967d79e4c066cdd39a9519e74d0bc8
Merge commit: d6b7fe8615be187f0c924028ba2904a959cf99e6

Thanks @zhangguiping-xydt!

openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 24, 2026

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 24, 2026

zhangguiping-xydt force-pushed the feat/issue-86077 branch from a237104 to 1a6d542 Compare May 24, 2026 17:31

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 24, 2026

altaywtf self-assigned this May 24, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 24, 2026

github-actions Bot added the dependencies-changed PR changes dependency-related files label May 24, 2026

openclaw-barnacle Bot added docs Improvements or additions to documentation channel: imessage Channel integration: imessage app: web-ui App: web-ui gateway Gateway runtime cli CLI command changes scripts Repository scripts docker Docker and sandbox tooling and removed size: S labels May 24, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026

altaywtf force-pushed the feat/issue-86077 branch from cc3fe4d to 2868713 Compare May 25, 2026 09:33

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026

altaywtf force-pushed the feat/issue-86077 branch 3 times, most recently from d74ce61 to 7f5a821 Compare May 25, 2026 09:46

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026

altaywtf force-pushed the feat/issue-86077 branch from 36675e0 to 36f648d Compare May 25, 2026 10:40

openclaw-barnacle Bot added size: M and removed size: S labels May 25, 2026

zhangguiping-xydt and others added 6 commits May 25, 2026 16:22

fix(agents): keep fallback errors candidate scoped

b273d00

fix: scope fallback assistant errors

687a2ee

fix: document fallback error scoping

6529c40

test: narrow fallback assistant error fixture

b8babbc

fix: preserve pi assistant fallback context

496eee1

fix: keep fallback changelog unreleased

73cc076

github-actions Bot mentioned this pull request May 25, 2026

📡 Upstream Digest — 2026-05-25 16:16 UTC curtismercier/openclaw-mods#944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix #86077: keep fallback errors candidate scoped#86134

fix #86077: keep fallback errors candidate scoped#86134
altaywtf merged 6 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-86077

zhangguiping-xydt commented May 24, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 24, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 24, 2026 •

edited

Loading

Uh oh!

altaywtf commented May 25, 2026

Uh oh!

clawsweeper Bot commented May 25, 2026 •

edited

Loading

Uh oh!

altaywtf commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zhangguiping-xydt commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Real behavior proof

Regression Test Plan

Root Cause

Uh oh!

clawsweeper Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hatch command

Uh oh!

altaywtf commented May 25, 2026

Uh oh!

clawsweeper Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

altaywtf commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangguiping-xydt commented May 24, 2026 •

edited

Loading

clawsweeper Bot commented May 24, 2026 •

edited

Loading

clawsweeper Bot commented May 24, 2026 •

edited

Loading

clawsweeper Bot commented May 25, 2026 •

edited

Loading