Skip to content

fix #86077: keep fallback errors candidate scoped#86134

Merged
altaywtf merged 6 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-86077
May 25, 2026
Merged

fix #86077: keep fallback errors candidate scoped#86134
altaywtf merged 6 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-86077

Conversation

@zhangguiping-xydt

@zhangguiping-xydt zhangguiping-xydt commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: later fallback candidates that timed out without a current assistant could surface a previous provider's raw error text, making provider triage misleading.
  • Solution: scope reused session assistant errors to the current candidate on timeout and only prefer raw fallback errors when their provider/model attribution matches the candidate.
  • What changed: updated embedded-runner assistant selection, model-fallback attempt error selection, and added focused regressions for both paths.
  • What did NOT change: no provider routing, auth refresh, profile rotation, external API payloads, or fallback candidate ordering changed.

Fixes #86077

Real behavior proof

  • Behavior or issue addressed: stale prior-provider quota text is no longer surfaced for a later candidate timeout.
  • Real environment tested: Linux 4.19.112-2.el8.x86_64, Node v24.16.0, OpenClaw v2026.5.24, worktree SHA 1a6d54
  • Exact steps or command run after this patch:
    OPENCLAW_SKIP_CHANNELS=1 pnpm gateway:dev
    node --import tsx /media/vdc/code/ai/aispace/openclaw-issue-86077-evidence/proof-after.mts
    node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts src/agents/model-fallback.test.ts
    pnpm check:test-types
  • Evidence after fix:
    $ OPENCLAW_SKIP_CHANNELS=1 node scripts/run-node.mjs --dev gateway
    [openclaw] Building TypeScript (dist is stale: config_newer - config newer than build stamp).
    runtime-postbuild: bundled plugin metadata completed in 352ms
    2026-05-25T01:13:33.187+08:00 [gateway] loading configuration…
    2026-05-25T01:13:33.356+08:00 [gateway] resolving authentication…
    2026-05-25T01:13:33.388+08:00 [gateway] starting...
    2026-05-25T01:13:40.050+08:00 [gateway] starting HTTP server...
    2026-05-25T01:13:41.305+08:00 [gateway] agent model: openai/gpt-5.5 (thinking=medium, fast=off)
    2026-05-25T01:13:41.308+08:00 [gateway] http server listening (8 plugins: acpx, browser, canvas, device-pair, file-transfer, memory-core, phone-control, talk-voice; 7.9s)
    2026-05-25T01:13:41.322+08:00 [gateway/channels] skipping channel start (OPENCLAW_SKIP_CHANNELS=1 or OPENCLAW_SKIP_PROVIDERS=1)
    2026-05-25T01:13:44.326+08:00 [gateway] ready
    2026-05-25T01:13:44.349+08:00 [heartbeat] started
    
    [model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/opus-4-7 candidate=anthropic/opus-4-7 reason=timeout next=google/gemini-3.1-pro-preview detail=LLM request timed out.
    [model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/opus-4-7 candidate=google/gemini-3.1-pro-preview reason=timeout next=none detail=LLM request timed out.
    {
      "candidate": "anthropic/opus-4-7",
      "reason": "timeout",
      "surfacedError": "LLM request timed out.",
      "staleOpenAITextSuppressed": true
    }
    
    Test Files  4 passed (4)
    Tests  140 passed (140)
    
  • Observed result after fix: OpenClaw started successfully in gateway dev mode, and the candidate failure detail remained the timeout message; the stale OpenAI quota text was suppressed instead of being attributed to the later candidate.
  • What was not tested: external provider outage calls were not made; the runtime started locally with channels skipped, and the patched fallback attribution logic was exercised through a direct local function proof.

Regression Test Plan

  • Coverage level: Unit test
  • Target test file: src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts; src/agents/model-fallback.test.ts
  • Scenario locked in: a later candidate timeout with no current assistant does not reuse the previous provider's session assistant, and model-fallback summaries ignore raw errors whose provider/model does not match the candidate.
  • Why this is the smallest reliable guardrail: it exercises the exact attribution boundary without needing real provider failures or slow timeout cascades.

Root Cause

  • Root cause: sessionLastAssistant was reused when currentAttemptAssistant was absent, even when the last assistant belonged to a previous provider/model; model-fallback then preferred that raw error text for the current candidate.
  • Missing detection / guardrail: no regression covered a later fallback candidate timeout with no current assistant and stale prior-provider session history.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 24, 2026
@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 7:16 AM ET / 11:16 UTC.

Summary
The PR scopes embedded-runner assistant error reuse and model-fallback raw-error summaries to the active fallback candidate, adds focused regressions, and updates the changelog.

PR surface: Source +39, Tests +161, Docs +1. Total +201 across 6 files.

Reproducibility: yes. Current main shows the source path: embedded runner can still use sessionLastAssistant when currentAttemptAssistant is absent, and model-fallback still prefers rawError over the candidate message.

Review metrics: 1 noteworthy metric.

  • Fallback Attribution Surface: 2 runtime paths changed, 2 test files updated, 1 changelog entry. Fallback diagnostics are compatibility-sensitive, so maintainers should notice both the embedded-runner and outer model-fallback attribution changes before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none

Risk before merge

  • This intentionally changes fallback diagnostic output for existing multi-provider setups: mismatched prior-provider raw text is suppressed for later candidates, so maintainers should confirm the compatibility behavior before merge.

Maintainer options:

  1. Confirm Scoped Diagnostics And Land (recommended)
    Maintainers can accept the intentional diagnostics behavior change because the latest head preserves same-candidate and PI-stamped assistant errors while suppressing mismatched prior-provider raw text.
  2. Ask For Broader Runtime Proof
    If maintainers want stronger upgrade confidence, request a live or Crabbox multi-provider fallback run before merge rather than another code repair.

Next step before merge
No automated repair remains; maintainer review should confirm the compatibility-sensitive fallback diagnostics behavior before merge.

Security
Cleared: The diff touches TypeScript runtime fallback diagnostics, focused tests, and changelog only; I found no concrete security or supply-chain concern.

Review details

Best possible solution:

Land the candidate-scoped fallback attribution fix after maintainer confirmation that the diagnostics behavior change is acceptable and required checks remain green.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main shows the source path: embedded runner can still use sessionLastAssistant when currentAttemptAssistant is absent, and model-fallback still prefers rawError over the candidate message.

Is this the best way to solve the issue?

Yes. The latest head is a narrow maintainable fix because it scopes stale assistant/raw-error reuse to the active candidate while preserving same-candidate and PI-stamped compaction diagnostics.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0bb9b421f3e6.

Label changes

Label justifications:

  • P2: This is a normal-priority agent fallback diagnostics fix with limited blast radius and focused regression coverage.
  • merge-risk: 🚨 compatibility: The PR intentionally changes fallback error reporting semantics for existing multi-provider setups, which can affect operator diagnostics during upgrades.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes structured after-fix terminal/log proof from a local gateway dev run plus direct output showing stale prior-provider quota text suppressed.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-fix terminal/log proof from a local gateway dev run plus direct output showing stale prior-provider quota text suppressed.
Evidence reviewed

PR surface:

Source +39, Tests +161, Docs +1. Total +201 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 3 48 9 +39
Tests 2 168 7 +161
Docs 1 1 0 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 217 16 +201

What I checked:

  • Repository policy applied: Root AGENTS.md and the scoped src/agents guides were read; root policy treats fallback behavior as compatibility-sensitive and the scoped guide supports focused helper/runner coverage for embedded-runner changes. (AGENTS.md:1, 0bb9b421f3e6)
  • Current main still has the stale assistant path: Current main still resolves error context from currentAttemptAssistant ?? sessionLastAssistant, matching the linked issue's stale shared-session assistant path before this PR. (src/agents/pi-embedded-runner/run.ts:1655, 0bb9b421f3e6)
  • Current main still prefers raw fallback errors: Current main records candidate failures with described.rawError ?? described.message, so a mismatched raw error can still win over the candidate-attributed message. (src/agents/model-fallback.ts:423, 0bb9b421f3e6)
  • PR scopes runner assistant reuse: At the PR head, sessionAssistantForCandidate is only reused when it matches the active provider/model and is then used for active error context, formatted assistant errors, assistantErrorText, and assistantForFailover. (src/agents/pi-embedded-runner/run.ts:1653, 36f648dcdc23)
  • PR preserves PI-stamped model semantics: The new isAssistantForModelRef helper resolves PI-stamped assistants through resolveReportedModelRef before comparing them to the current candidate, preserving the existing PI attribution contract. (src/agents/pi-embedded-runner/run/helpers.ts:104, 36f648dcdc23)
  • PR scopes outer fallback summaries: The new resolveCandidateAttemptError keeps rawError only when it has no conflicting provider/model attribution for the current candidate; otherwise it uses the candidate message. (src/agents/model-fallback.ts:403, 36f648dcdc23)

Likely related people:

  • vincentkoc: Current-main blame and GitHub path history show recent changes around the implicated runner and model-fallback lines, including fallback provider resolution work. (role: recent area contributor; confidence: high; commits: 73189e3ecb0c, 3c8d101f5a85, 2389b3bdd377; files: src/agents/pi-embedded-runner/run.ts, src/agents/model-fallback.ts, src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts)
  • steipete: GitHub path history shows recent work on model-fallback and fallback error-context tests that shape the candidate failure summary path touched here. (role: recent model-fallback contributor; confidence: medium; commits: cb5fdcf52cf8, 4c210e22fa97; files: src/agents/model-fallback.ts, src/agents/model-fallback.test.ts, src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts)
  • altaywtf: The PR timeline shows assignment and a maintainer re-review request, and the latest branch commits were authored to preserve same-candidate and PI-stamped fallback context. (role: PR follow-up owner; confidence: high; commits: 1aa308aae53c, b7994f1b88ce, e05db0c8dbfb; files: src/agents/pi-embedded-runner/run.ts, src/agents/pi-embedded-runner/run/helpers.ts, src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 24, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. labels May 24, 2026
@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Brave Diff Drake

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: sleeps inside passing CI.
Image traits: location proof lagoon; accessory tiny test log scroll; palette moss green and polished brass; mood focused; pose waving from a small platform; shell frosted glass shell; lighting bright celebratory glints; background smooth stones and checkmarks.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Brave Diff Drake in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 24, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 24, 2026
@altaywtf altaywtf self-assigned this May 24, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 24, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 24, 2026
@github-actions github-actions Bot added the dependencies-changed PR changes dependency-related files label May 24, 2026
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: imessage Channel integration: imessage app: web-ui App: web-ui gateway Gateway runtime cli CLI command changes scripts Repository scripts docker Docker and sandbox tooling and removed size: S labels May 24, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@altaywtf altaywtf force-pushed the feat/issue-86077 branch from cc3fe4d to 2868713 Compare May 25, 2026 09:33
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@altaywtf altaywtf force-pushed the feat/issue-86077 branch 3 times, most recently from d74ce61 to 7f5a821 Compare May 25, 2026 09:46
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@altaywtf altaywtf force-pushed the feat/issue-86077 branch from 36675e0 to 36f648d Compare May 25, 2026 10:40
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 25, 2026
@altaywtf

Copy link
Copy Markdown
Member

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@altaywtf

Copy link
Copy Markdown
Member

Merged via squash.

Thanks @zhangguiping-xydt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pi-embedded-runner: stale sessionLastAssistant leaks prior provider's error string into later candidates in model-fallback

2 participants