Skip to content

fix(webchat): finalize provider failure lifecycle#91895

Merged
sallyom merged 2 commits into
openclaw:mainfrom
TurboTheTurtle:codex/91730-provider-failure
Jun 10, 2026
Merged

fix(webchat): finalize provider failure lifecycle#91895
sallyom merged 2 commits into
openclaw:mainfrom
TurboTheTurtle:codex/91730-provider-failure

Conversation

@TurboTheTurtle

@TurboTheTurtle TurboTheTurtle commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Fixes #91730

Summary

  • Emit a marked final lifecycle error after embedded provider fallback is exhausted.
  • Let the gateway finalize that marked error immediately while preserving retry grace for per-attempt fallback errors.
  • Cover both the final-failure marker and the immediate gateway cleanup path with focused regression tests.
  • Stabilize the deadcode unused-files test against CI resolving pnpm as an absolute runner path while preserving exact args/options checks.

Verification

  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts src/gateway/server-chat.agent-events.test.ts
  • node scripts/run-vitest.mjs src/gateway/server-methods/agent.test.ts ui/src/ui/chat/run-lifecycle.test.ts ui/src/ui/session-run-state.test.ts ui/src/ui/app-chat.test.ts src/gateway/session-lifecycle-state.test.ts
  • node scripts/run-vitest.mjs test/scripts/package-acceptance-workflow.test.ts
  • node scripts/run-vitest.mjs test/scripts/openclaw-e2e-instance.test.ts
  • node scripts/run-vitest.mjs test/scripts/check-deadcode-unused-files.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.full-core-support-boundary.config.ts
  • corepack pnpm exec oxfmt --check --threads=1 src/auto-reply/reply/agent-runner-execution.ts src/auto-reply/reply/agent-runner-execution.test.ts src/gateway/server-chat.ts src/gateway/server-chat.agent-events.test.ts
  • node scripts/run-oxlint.mjs src/auto-reply/reply/agent-runner-execution.ts src/auto-reply/reply/agent-runner-execution.test.ts src/gateway/server-chat.ts src/gateway/server-chat.agent-events.test.ts
  • corepack pnpm exec oxfmt --check --threads=1 test/scripts/check-deadcode-unused-files.test.ts
  • node scripts/run-oxlint.mjs test/scripts/check-deadcode-unused-files.test.ts
  • git diff --check
  • corepack pnpm openclaw --version -> OpenClaw 2026.6.2 (c604b58)

Real behavior proof

Behavior addressed: OpenClaw-native provider failures that exhaust fallback now produce a final lifecycle error signal, so the gateway clears the webchat run and persists terminal failed session state immediately instead of leaving the session in progress/running.

Real environment tested: Patched local checkout /Users/andy/openclaw-91730-provider-failure on macOS, head c604b584263bd554d5246dd5d8437b48add0aa4f, build OpenClaw 2026.6.2 (c604b58), isolated temp OPENCLAW_HOME=/tmp/openclaw-91730-proof-cifix-IYxcUk, gateway on loopback port 18796, token auth with a throwaway proof token, default dev agent model openai/gpt-5.5, and deliberately invalid OPENAI_API_KEY=sk-openc*************alid. The provider failure used the real embedded runtime against OpenAI Responses websocket/HTTP endpoints; no provider mock was used for this proof.

Exact steps or command run after this patch: Started the temp gateway with OPENCLAW_HOME=/tmp/openclaw-91730-proof-cifix-IYxcUk OPENAI_API_KEY=<invalid> corepack pnpm openclaw gateway run --dev --reset --port 18796 --auth token --token <throwaway> --tailscale off --compact --verbose. Connected a loopback backend WebSocket client with shared-token auth and scopes operator.read,operator.write, verified health and an empty sessions.list, then sent chat.send with sessionKey=agent:dev:main, message=PR91895_PROOF_TRIGGER provider failure lifecycle ci-fix, and idempotencyKey=pr91895-proof-cifix-20260610T0231. After the terminal chat event, queried sessions.list, chat.history, and health, then stopped the temp gateway cleanly.

Evidence after fix: Gateway logs showed the real runtime starting the turn and contacting OpenAI, then failing on invalid auth:

OpenClaw 2026.6.2 (c604b58)
[ws] ⇄ res ✓ chat.send 5ms runId=pr91895-proof-cifix-20260610T0231
[diagnostic] session turn created: runId=pr91895-proof-cifix-20260610T0231 sessionId=354fa727-3b28-46a9-9045-f52efd334c1b sessionKey=agent:dev:main agentId=dev channel=webchat trigger=user
[agent/embedded] ... failed to connect to websocket: HTTP error: 401 Unauthorized, url: wss://api.openai.com/v1/responses
[agent/embedded] ... Incorrect API key provided: sk-openc*************alid ... auth error code: invalid_api_key
[ws] -> event agent ... stream=lifecycle ... phase=error
[diagnostic] session state: sessionId=354fa727-3b28-46a9-9045-f52efd334c1b sessionKey=agent:dev:main prev=processing new=idle reason="run_completed" queueDepth=0
[diagnostic] run cleared: sessionId=354fa727-3b28-46a9-9045-f52efd334c1b totalActive=0
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-5.5 candidate=openai/gpt-5.5 reason=rate_limit next=none detail=unexpected status 401 Unauthorized ... auth error code: invalid_api_key
[ws] -> event agent ... stream=lifecycle ... phase=fallback_step
[ws] -> event agent ... stream=lifecycle ... phase=error
[diagnostic] message processed: channel=webchat ... sessionKey=agent:dev:main outcome=completed
[shutdown] completed cleanly in 110ms

The proof client observed the terminal chat event:

{
  "chatSendAck": { "runId": "pr91895-proof-cifix-20260610T0231", "status": "started" },
  "terminalChatEvent": {
    "runId": "pr91895-proof-cifix-20260610T0231",
    "sessionKey": "agent:dev:main",
    "state": "error",
    "errorMessage": "unexpected status 401 Unauthorized: Incorrect API key provided: sk-openc*************alid..."
  }
}

A settled sessions.list after message_completed showed terminal failed state and no active run:

{
  "key": "agent:dev:main",
  "sessionId": "354fa727-3b28-46a9-9045-f52efd334c1b",
  "status": "failed",
  "startedAt": 1781083947833,
  "endedAt": 1781083966242,
  "runtimeMs": 18409,
  "modelProvider": "openai",
  "model": "gpt-5.5",
  "agentRuntime": { "id": "codex", "source": "implicit" },
  "deliveryContext": { "channel": "webchat" },
  "lastChannel": "webchat",
  "hasActiveRun": false
}

The same proof client also saw healthBefore.ok=true, healthAfter.ok=true, healthAfter.sessionCount=1, historySummary.ok=true, and historySummary.messageCount=1.

Observed result after fix: The live temp-gateway turn produced a visible terminal chat error event, cleared the active run (totalActive=0), and persisted the session as status:"failed" with endedAt, runtimeMs, and hasActiveRun:false. Health remained OK after the failure, and the temp proof gateway was stopped cleanly after the proof.

What was not tested: I did not reproduce the exact reporter environment of Linux plus OpenAI OAuth on openai/gpt-5.4-mini; the proof uses the same OpenClaw-native provider failure class with a deliberately invalid OpenAI API key so it can be reproduced safely without live credentials. At the time of this final proof update, GitHub Actions and ClawSweeper re-review still need to run on the amended head.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S labels Jun 10, 2026
@TurboTheTurtle

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added the triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. label Jun 10, 2026
@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 10, 2026, 5:43 AM ET / 09:43 UTC.

Summary
The branch emits a finalFailure lifecycle marker after embedded provider fallback exhaustion, makes the gateway finalize marked errors immediately, adds focused runner/gateway regression tests, and relaxes a pnpm command assertion in the deadcode unused-files test.

PR surface: Source +15, Tests +59. Total +74 across 5 files.

Reproducibility: yes. Source inspection gives a high-confidence current-main path: an unmarked lifecycle error is deferred, then a later fallback lifecycle event clears the pending terminal error; I did not run a live current-main repro in this read-only review.

Review metrics: 1 noteworthy metric.

  • Lifecycle terminal marker: 1 internal marker added. The new finalFailure flag is the contract that changes gateway session finalization timing before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Wait for the relevant checks on head c604b58.
  • Have a maintainer explicitly accept finalFailure as the fallback-exhausted provider failure contract.

Risk before merge

  • [P2] The PR intentionally changes session-state finalization timing: if finalFailure is later emitted for a per-attempt fallback error, webchat could prematurely clear an active run and persist failed session state.
  • [P1] The live proof exercises the same provider-failure class with an invalid OpenAI key, but it does not exactly reproduce the reporter's Linux OAuth openai/gpt-5.4-mini setup.

Maintainer options:

  1. Accept the final-failure contract (recommended)
    Merge after maintainer review and green checks if the team agrees finalFailure is the internal contract for fallback-exhausted provider failures.
  2. Require exact-environment proof
    Ask for Linux OAuth openai/gpt-5.4-mini proof first if maintainers need confidence beyond the same OpenClaw-native provider-failure class.
  3. Pause for a typed lifecycle API
    Hold the PR if maintainers want final terminal provider failure represented by a typed lifecycle event instead of an ad hoc data flag.

Next step before merge

  • [P2] No repair lane is needed; the remaining work is maintainer review, relevant checks, and explicit acceptance of the session-state risk.

Security
Cleared: The diff does not change dependencies, workflows, permissions, package resolution, credential handling, or secret-handling code.

Review details

Best possible solution:

Land the narrow final-failure lifecycle contract after maintainers accept the session-state timing change and relevant checks remain green.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection gives a high-confidence current-main path: an unmarked lifecycle error is deferred, then a later fallback lifecycle event clears the pending terminal error; I did not run a live current-main repro in this read-only review.

Is this the best way to solve the issue?

Yes. The PR is a narrow fix because the runner marks only fallback-exhausted failures and the gateway preserves retry grace for ordinary per-attempt errors; a typed lifecycle event could be cleaner later but is not required for this patch.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c84e52192063.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes sufficient after-fix live-output proof from a temp gateway using the real embedded runtime, with redacted credentials and observed terminal session cleanup.

Label justifications:

  • P1: The linked bug leaves webchat sessions stuck in running/in-progress after provider failure, blocking clear recovery for affected users.
  • merge-risk: 🚨 session-state: The PR changes when gateway lifecycle errors clear active runs and persist failed session state.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes sufficient after-fix live-output proof from a temp gateway using the real embedded runtime, with redacted credentials and observed terminal session cleanup.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes sufficient after-fix live-output proof from a temp gateway using the real embedded runtime, with redacted credentials and observed terminal session cleanup.
Evidence reviewed

PR surface:

Source +15, Tests +59. Total +74 across 5 files.

View PR surface stats
Area Files Added Removed Net
Source 2 16 1 +15
Tests 3 62 3 +59
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 5 78 4 +74

What I checked:

  • Root policy read: Read the full root AGENTS.md; gateway session state and fallback behavior are compatibility-sensitive review surfaces, so this PR needs explicit session-state risk review rather than diff-only approval. (AGENTS.md:1, c84e52192063)
  • Gateway scoped policy read: Read the gateway scoped guide; it reinforces checking gateway lifecycle tests and hot-path behavior for changes under src/gateway. (src/gateway/AGENTS.md:1, c84e52192063)
  • Current-main deferred lifecycle behavior: Current main clears pending terminal lifecycle errors on any non-error lifecycle phase, and defers lifecycle error finalization while retry grace is active; a fallback lifecycle event can therefore cancel a pending per-attempt error before a terminal failure is persisted. (src/gateway/server-chat.ts:1007, c84e52192063)
  • Session persistence callee: Gateway terminal lifecycle finalization clears run context, deletes run sequence state, clears the active run, persists the lifecycle event, and broadcasts sessions.changed when a session key is available. (src/gateway/server-chat.ts:482, c84e52192063)
  • Runner backstop behavior on current main: The embedded lifecycle backstop marks ordinary lifecycle end/error events as terminal, but those unmarked errors still go through the gateway retry grace path. (src/auto-reply/reply/agent-runner-execution.ts:1307, c84e52192063)
  • PR implementation path: The PR adds a marked finalFailure lifecycle error after fallback exhaustion and has the gateway bypass retry grace only when that marker is true. (src/auto-reply/reply/agent-runner-execution.ts:2932, c604b584263b)

Likely related people:

  • TurboTheTurtle: Beyond authoring this PR, public current-main history shows prior merged work touching gateway chat/session code and auto-reply fallback policy, including commits 9833f3e and 57633c4. (role: recent area contributor; confidence: medium; commits: 9833f3ea9bf8, 57633c42b647, c604b584263b; files: src/gateway/server-chat.ts, src/auto-reply/reply/agent-runner-execution.ts)
  • vincentkoc: Public file history shows recent gateway instrumentation and auto-reply delivery work, plus committer history on adjacent fallback behavior. (role: recent area contributor; confidence: medium; commits: c9050c982d95, 280d1cb977c4, 57633c42b647; files: src/gateway/server-chat.ts, src/auto-reply/reply/agent-runner-execution.ts)
  • obviyus: Public current-main history shows recent work in the same auto-reply runner surface around compaction notice behavior, adjacent to the embedded run lifecycle path. (role: adjacent auto-reply contributor; confidence: low; commits: 98d5c465308a; files: src/auto-reply/reply/agent-runner-execution.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 10, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 10, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the codex/91730-provider-failure branch from b209862 to 4572bff Compare June 10, 2026 08:47
@TurboTheTurtle

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 10, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the codex/91730-provider-failure branch from 4572bff to 6c7b73c Compare June 10, 2026 09:09
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 10, 2026
@TurboTheTurtle

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 10, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the codex/91730-provider-failure branch from 6c7b73c to 5e07c82 Compare June 10, 2026 09:21
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 10, 2026
@TurboTheTurtle

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 10, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the codex/91730-provider-failure branch from 5e07c82 to c604b58 Compare June 10, 2026 09:34
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 10, 2026
@TurboTheTurtle

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 10, 2026
@sallyom sallyom self-assigned this Jun 10, 2026
Signed-off-by: sallyom <somalley@redhat.com>
@sallyom

sallyom commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

I added a small rename to make the marker explicitly fallbackExhaustedFailure, so a future per-attempt fallback error would have to misuse a very specific field, addressing the P2 found in review

CI red checks are unrelated and check-test-types succeeded locally.

sallyom commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Pushed a small signed-off maintainer follow-up commit: d868e8618b5.

This renames the internal lifecycle marker from finalFailure to fallbackExhaustedFailure. Behavior is unchanged: the gateway still finalizes immediately only after provider fallback is exhausted. The narrower name addresses the review concern that a broad finalFailure flag could later be reused for a per-attempt fallback error and accidentally clear/persist a still-active webchat run too early.

Focused verification after the rename:

  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts src/gateway/server-chat.agent-events.test.ts
  • git diff --cached --check before commit

I did not rerun the live Real Behavior Proof because this is a mechanical internal marker rename; the existing proof still covers the unchanged fallback-exhausted failure behavior.

@sallyom sallyom merged commit 33a3e05 into openclaw:main Jun 10, 2026
152 of 157 checks passed
@TurboTheTurtle TurboTheTurtle deleted the codex/91730-provider-failure branch June 10, 2026 18:01
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 11, 2026
* fix(webchat): finalize provider failure lifecycle

* chore: narrow fallback failure lifecycle marker

Signed-off-by: sallyom <somalley@redhat.com>

---------

Signed-off-by: sallyom <somalley@redhat.com>
Co-authored-by: sallyom <somalley@redhat.com>
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request Jun 12, 2026
…26.6.6) (#1040)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.5` → `2026.6.6` |

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.6.6`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202666)

[Compare Source](openclaw/openclaw@v2026.6.5...v2026.6.6)

##### Highlights

- Security boundaries are substantially tighter across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP access, native search policy, elevated sender checks, deleted-agent ACP bypasses, loopback tools, Discord moderation, and Teams group actions; exec approvals now fail closed on timeout. ([#&#8203;91529](openclaw/openclaw#91529), [#&#8203;91618](openclaw/openclaw#91618), [#&#8203;91615](openclaw/openclaw#91615), [#&#8203;91619](openclaw/openclaw#91619), [#&#8203;91741](openclaw/openclaw#91741), [#&#8203;91745](openclaw/openclaw#91745), [#&#8203;91746](openclaw/openclaw#91746), [#&#8203;91748](openclaw/openclaw#91748), [#&#8203;91749](openclaw/openclaw#91749), [#&#8203;91750](openclaw/openclaw#91750), [#&#8203;91751](openclaw/openclaw#91751), [#&#8203;91752](openclaw/openclaw#91752), [#&#8203;91763](openclaw/openclaw#91763), [#&#8203;89938](openclaw/openclaw#89938)) Thanks [@&#8203;joshavant](https://github.com/joshavant), [@&#8203;pgondhi987](https://github.com/pgondhi987), [@&#8203;mmaps](https://github.com/mmaps), [@&#8203;eleqtrizit](https://github.com/eleqtrizit), [@&#8203;shakkernerd](https://github.com/shakkernerd), and [@&#8203;drobison00](https://github.com/drobison00).
- Telegram delivery is safer and more coherent: account-scoped topics route to the right agent, streamed text survives tool calls, `/compact` works on generic ingress, callback handling uses concrete APIs, draft chunking is shared, durable dispatch dedupe moved into the SDK, and unauthorized DM text stays out of cache and prompt context. ([#&#8203;91189](openclaw/openclaw#91189), [#&#8203;88682](openclaw/openclaw#88682), [#&#8203;89588](openclaw/openclaw#89588), [#&#8203;90212](openclaw/openclaw#90212), [#&#8203;91876](openclaw/openclaw#91876), [#&#8203;91874](openclaw/openclaw#91874), [#&#8203;91904](openclaw/openclaw#91904), [#&#8203;91478](openclaw/openclaw#91478), [#&#8203;91915](openclaw/openclaw#91915)) Thanks [@&#8203;codysai001](https://github.com/codysai001), [@&#8203;alexzhu0](https://github.com/alexzhu0), [@&#8203;joelnishanth](https://github.com/joelnishanth), [@&#8203;snowzlm](https://github.com/snowzlm), [@&#8203;obviyus](https://github.com/obviyus), and [@&#8203;sallyom](https://github.com/sallyom).
- iMessage recovery and delivery now cover always-on inbound restart, durable echo markers, block streaming, idle approval discovery, hardened outbound transport, and actionable inbound startup diagnostics. ([#&#8203;91335](openclaw/openclaw#91335), [#&#8203;91449](openclaw/openclaw#91449), [#&#8203;88969](openclaw/openclaw#88969), [#&#8203;88530](openclaw/openclaw#88530), [#&#8203;91783](openclaw/openclaw#91783), [#&#8203;91785](openclaw/openclaw#91785)) Thanks [@&#8203;omarshahine](https://github.com/omarshahine), [@&#8203;jmissig](https://github.com/jmissig), and [@&#8203;colmbrogan](https://github.com/colmbrogan).
- Browser and MCP connectivity gained existing-session CDP support, discovered WebSocket validation, default-profile `cdpUrl` handling, safer browser-output boundaries, Streamable HTTP loopback transport, corrected OAuth/SSE authorization handling, and broader schema compatibility. ([#&#8203;91422](openclaw/openclaw#91422), [#&#8203;89851](openclaw/openclaw#89851), [#&#8203;91736](openclaw/openclaw#91736), [#&#8203;91747](openclaw/openclaw#91747), [#&#8203;91451](openclaw/openclaw#91451), [#&#8203;80143](openclaw/openclaw#80143)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987), [@&#8203;anagnorisis2peripeteia](https://github.com/anagnorisis2peripeteia), [@&#8203;lifuyue](https://github.com/lifuyue), [@&#8203;eleqtrizit](https://github.com/eleqtrizit), [@&#8203;LiuwqGit](https://github.com/LiuwqGit), and [@&#8203;HemantSudarshan](https://github.com/HemantSudarshan).
- Control UI startup and first-reply latency are lower through cached model metadata, removal of the startup catalog wait, lazy slash-command loading, and first-event tracing with slow-reply diagnostics. ([#&#8203;91531](openclaw/openclaw#91531), [#&#8203;91538](openclaw/openclaw#91538), [#&#8203;91568](openclaw/openclaw#91568), [#&#8203;91583](openclaw/openclaw#91583), [#&#8203;91598](openclaw/openclaw#91598))
- Provider support expands with OpenRouter OAuth onboarding and Claude Fable 5 adaptive thinking, while Codex sessions keep correct compaction ownership, local models skip guardian review, dynamic tool progress normalizes cleanly, and Gemma 4 reasoning replay is preserved. ([#&#8203;91830](openclaw/openclaw#91830), [#&#8203;91882](openclaw/openclaw#91882), [#&#8203;91590](openclaw/openclaw#91590), [#&#8203;88630](openclaw/openclaw#88630), [#&#8203;88768](openclaw/openclaw#88768), [#&#8203;91696](openclaw/openclaw#91696)) Thanks [@&#8203;Patrick-Erichsen](https://github.com/Patrick-Erichsen), [@&#8203;joshavant](https://github.com/joshavant), [@&#8203;bdjben](https://github.com/bdjben), and [@&#8203;Coder-Wangyankun](https://github.com/Coder-Wangyankun).

##### Changes

- CLI progress: emit Claude CLI commentary progress events and bridge inter-tool commentary into channel progress without exposing internal protocol scaffolding. ([#&#8203;89834](openclaw/openclaw#89834), [#&#8203;90883](openclaw/openclaw#90883)) Thanks [@&#8203;anagnorisis2peripeteia](https://github.com/anagnorisis2peripeteia).
- Observability: allow trusted diagnostics channels to capture tool input/output content, add first-assistant-event traces, and warn on slow initial replies. ([#&#8203;91256](openclaw/openclaw#91256), [#&#8203;91568](openclaw/openclaw#91568), [#&#8203;91583](openclaw/openclaw#91583)) Thanks [@&#8203;amknight](https://github.com/amknight).
- Plugins/ClawHub: dogfood reusable package publishing, let dry runs skip publish approval, allow declared installed trusted hooks, report managed plugin version drift, and warn instead of failing on retired Skill Workshop configuration. ([#&#8203;91574](openclaw/openclaw#91574), [#&#8203;91591](openclaw/openclaw#91591), [#&#8203;90004](openclaw/openclaw#90004), [#&#8203;90927](openclaw/openclaw#90927), [#&#8203;90838](openclaw/openclaw#90838)) Thanks [@&#8203;Patrick-Erichsen](https://github.com/Patrick-Erichsen), [@&#8203;brokemac79](https://github.com/brokemac79), and [@&#8203;lonexreb](https://github.com/lonexreb).
- Memory/providers: move the local llama.cpp runtime into its provider plugin, batch embeddings across files, persist the agent model catalog cache, and keep QMD JSON search one-shot while filtering stale REM recall previews. ([#&#8203;91324](openclaw/openclaw#91324), [#&#8203;89138](openclaw/openclaw#89138), [#&#8203;90457](openclaw/openclaw#90457), [#&#8203;91837](openclaw/openclaw#91837), [#&#8203;91851](openclaw/openclaw#91851)) Thanks [@&#8203;osolmaz](https://github.com/osolmaz), [@&#8203;mushuiyu886](https://github.com/mushuiyu886), [@&#8203;ai-hpc](https://github.com/ai-hpc), and [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Channels/mobile: add the QQBot group mention toggle, improve iPad and iPhone control surfaces, and expose the active connection host in the TUI footer. ([#&#8203;91423](openclaw/openclaw#91423), [#&#8203;91557](openclaw/openclaw#91557), [#&#8203;89909](openclaw/openclaw#89909)) Thanks [@&#8203;cxyhhhhh](https://github.com/cxyhhhhh), [@&#8203;Solvely-Colin](https://github.com/Solvely-Colin), and [@&#8203;baskduf](https://github.com/baskduf).
- Performance: prewarm TUI runtime plugins, deduplicate plugin auto-enable fanout, trim dense text-delta snapshots, and reuse prepared startup model metadata. ([#&#8203;90782](openclaw/openclaw#90782), [#&#8203;89978](openclaw/openclaw#89978), [#&#8203;91580](openclaw/openclaw#91580), [#&#8203;91531](openclaw/openclaw#91531)) Thanks [@&#8203;RomneyDa](https://github.com/RomneyDa) and [@&#8203;ai-hpc](https://github.com/ai-hpc).

##### Fixes

- Agent/session recovery: drop stale approval follow-ups after session rebind, remove drained reply-queue items by identity, recover stale main and visible replies, preserve Codex context-engine compaction ownership, lower the default compaction timeout to 180 seconds while respecting explicit configuration, and keep provider-failure terminal lifecycle state correct. ([#&#8203;85679](openclaw/openclaw#85679), [#&#8203;91450](openclaw/openclaw#91450), [#&#8203;91566](openclaw/openclaw#91566), [#&#8203;91840](openclaw/openclaw#91840), [#&#8203;91590](openclaw/openclaw#91590), [#&#8203;91361](openclaw/openclaw#91361), [#&#8203;91895](openclaw/openclaw#91895)) Thanks [@&#8203;openperf](https://github.com/openperf), [@&#8203;yetval](https://github.com/yetval), [@&#8203;joshavant](https://github.com/joshavant), [@&#8203;wangmiao0668000666](https://github.com/wangmiao0668000666), and [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- User-visible content boundaries: suppress Codex/Harmony protocol artifacts, neutralize browser and LanceDB memory media directives, redact transcript images, and preserve native `/compact` replies through source suppression. ([#&#8203;89151](openclaw/openclaw#89151), [#&#8203;91422](openclaw/openclaw#91422), [#&#8203;91425](openclaw/openclaw#91425), [#&#8203;91529](openclaw/openclaw#91529), [#&#8203;90212](openclaw/openclaw#90212)) Thanks [@&#8203;joelnishanth](https://github.com/joelnishanth), [@&#8203;pgondhi987](https://github.com/pgondhi987), [@&#8203;joshavant](https://github.com/joshavant), and [@&#8203;snowzlm](https://github.com/snowzlm).
- Channel delivery: keep WhatsApp captured replies attached to the successor controller after restart, retry Feishu rate limits, preserve Mattermost thread replies, canonicalize LINE webhook paths, restore Discord reply hydration and runtime timeout exports, and show OpenAI Realtime WebRTC assistant transcripts. ([#&#8203;85823](openclaw/openclaw#85823), [#&#8203;89659](openclaw/openclaw#89659), [#&#8203;91684](openclaw/openclaw#91684), [#&#8203;91649](openclaw/openclaw#91649), [#&#8203;90263](openclaw/openclaw#90263), [#&#8203;91686](openclaw/openclaw#91686), [#&#8203;90426](openclaw/openclaw#90426)) Thanks [@&#8203;itsuzef](https://github.com/itsuzef), [@&#8203;ladygege](https://github.com/ladygege), [@&#8203;jacobtomlinson](https://github.com/jacobtomlinson), [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev), and [@&#8203;shushushv](https://github.com/shushushv).
- Cron: cancel active task runs cleanly, preserve terminal timeout/cancel state, and recover no-deliver tool warnings instead of silently losing the outcome. ([#&#8203;90666](openclaw/openclaw#90666), [#&#8203;90678](openclaw/openclaw#90678)) Thanks [@&#8203;ai-hpc](https://github.com/ai-hpc).
- Gateway/config/auth: share the approval runtime socket token, replace arrays explicitly in `config.patch`, skip the deleted-agent guard only for valid ACP harness sessions, surface headless LaunchAgent state, verify SQLite auth migration before cleanup, and arm QMD startup maintenance. ([#&#8203;87105](openclaw/openclaw#87105), [#&#8203;91551](openclaw/openclaw#91551), [#&#8203;91219](openclaw/openclaw#91219), [#&#8203;91614](openclaw/openclaw#91614), [#&#8203;91740](openclaw/openclaw#91740), [#&#8203;91978](openclaw/openclaw#91978)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev) and [@&#8203;scotthuang](https://github.com/scotthuang).
- Providers/Codex: clarify quota errors, restore the Codex synthetic usage line, canonicalize Codex protocol assets, require API-key auth for realtime voice, normalize ACP model refs, preserve Gemma 4 `reasoning_content`, and avoid guardian review for local models. ([#&#8203;91390](openclaw/openclaw#91390), [#&#8203;91709](openclaw/openclaw#91709), [#&#8203;91507](openclaw/openclaw#91507), [#&#8203;91567](openclaw/openclaw#91567), [#&#8203;88630](openclaw/openclaw#88630), [#&#8203;91696](openclaw/openclaw#91696)) Thanks [@&#8203;hxy91819](https://github.com/hxy91819), [@&#8203;brokemac79](https://github.com/brokemac79), [@&#8203;RomneyDa](https://github.com/RomneyDa), [@&#8203;joshavant](https://github.com/joshavant), and [@&#8203;Coder-Wangyankun](https://github.com/Coder-Wangyankun).
- Updates/builds: recover package Gateway restarts after refresh failure, expose plugin convergence repair, fall back to Corepack in PATH-less pnpm environments, seed the correct Docker store packages, and keep ClawHub dry-run and publish paths reusable. ([#&#8203;91581](openclaw/openclaw#91581), [#&#8203;91599](openclaw/openclaw#91599), [#&#8203;91547](openclaw/openclaw#91547), [#&#8203;91591](openclaw/openclaw#91591)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev), [@&#8203;sallyom](https://github.com/sallyom), and [@&#8203;Patrick-Erichsen](https://github.com/Patrick-Erichsen).
- UI: require explicit user intent before opening chat sessions and drain restored chat queues after session switches. ([#&#8203;91480](openclaw/openclaw#91480)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Android: avoid the `dataSync` foreground-service type for persistent nodes. ([#&#8203;80082](openclaw/openclaw#80082)) Thanks [@&#8203;davelutztx](https://github.com/davelutztx).
- Native hooks: bound relay lifetimes so abandoned native hook connections cannot linger indefinitely. ([#&#8203;91550](openclaw/openclaw#91550)) Thanks [@&#8203;joshavant](https://github.com/joshavant).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/1040
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: OpenClaw-native provider failure leaves web chat session stuck in progress

2 participants