Skip to content

fix(llm-idle-timeout): honor models.providers.<id>.timeoutSeconds for cloud providers (#77744, #78361)#83979

Merged
shakkernerd merged 3 commits into
openclaw:mainfrom
yujiawei:fix/honor-provider-timeout-in-idle-watchdog
May 19, 2026
Merged

fix(llm-idle-timeout): honor models.providers.<id>.timeoutSeconds for cloud providers (#77744, #78361)#83979
shakkernerd merged 3 commits into
openclaw:mainfrom
yujiawei:fix/honor-provider-timeout-in-idle-watchdog

Conversation

@yujiawei

@yujiawei yujiawei commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

The schema help for models.providers.<id>.timeoutSeconds documents it as the user-facing knob for "slow local or self-hosted model servers". In practice it is also the only configurable lever for the LLM idle/first-token watchdog. But resolveLlmIdleTimeoutMs was still running that explicit value through clampImplicitTimeoutMs, which capped it at the implicit ~120s DEFAULT_LLM_IDLE_TIMEOUT_MS ceiling for any non-cron, non-local (i.e. cloud) provider.

End result: users who set models.providers.<provider>.timeoutSeconds: 600 (or 14400 for a llama.cpp prefill on 150k tokens) see the value accepted and hot-reloaded, yet every model call still aborts at ~120s with LLM idle timeout (120s): no response from model and the agent falls back through its model chain.

This matches the symptoms reported in:

It is also a frequent foot-gun for users running tools.profile: "full" with Anthropic Opus models: ~5w+ tool-payload tokens push first-token time past 120s, and there is no documented config to extend it. The popular sed workaround that bumps the constant is widely circulated but not maintainable.

Fix

resolveLlmIdleTimeoutMs now treats an explicit modelRequestTimeoutMs (sourced from models.providers.<id>.timeoutSeconds / model.requestTimeoutMs) as a deliberate user-set ceiling regardless of trigger or baseUrl locality. The agent/run-timeout timeoutBounds still apply via Math.min(...), so a shorter explicit run timeout always wins.

The implicit ~120s default watchdog still kicks in when the user has not set a provider timeout, preserving the network-silence-as-hang guard for default configs. The local-provider / Ollama-cloud branches further down are untouched.

   const modelRequestTimeoutMs = params?.modelRequestTimeoutMs;
   if (
     typeof modelRequestTimeoutMs === "number" &&
     Number.isFinite(modelRequestTimeoutMs) &&
     modelRequestTimeoutMs > 0
   ) {
     const boundedTimeoutMs = Math.min(modelRequestTimeoutMs, ...timeoutBounds);
-    if (params?.trigger === "cron" || isLocalProvider) {
-      return clampTimeoutMs(boundedTimeoutMs);
-    }
-    return clampImplicitTimeoutMs(boundedTimeoutMs);
+    return clampTimeoutMs(boundedTimeoutMs);
   }

Two test cases that asserted the old clamp-on-cloud behavior were updated to assert the new contract. Schema help refreshed to call out that the same knob raises the idle watchdog ceiling, since users were guessing this from #77744 / #78361 already.

Real behavior proof

  • Behavior or issue addressed: Cloud-provider LLM idle/stream watchdog ignores models.providers.<id>.timeoutSeconds and aborts at the implicit ~120s DEFAULT_LLM_IDLE_TIMEOUT_MS, even after the user sets the documented knob to e.g. 600s (LLM idle timeout cannot be configured for long local llama.cpp requests in 2026.5.3-1 #77744, Expose model stream idle-timeout as user config (currently ~120s, hardcoded) #78361). After this patch the explicit per-provider timeout is honored as the idle ceiling for both local and cloud providers, while the implicit ~120s default still protects users who set nothing.

  • Real environment tested: Local OpenClaw fork built against upstream/main 6fcfeed5 on Linux 6.17.0-1008-gcp x64, Node v22.22.0, pnpm 11.1.0. Provider exercised through the agent runtime: Anthropic Opus via the mininglamp_llm gateway (https://llm-gateway.mlamp.cn) — a cloud provider whose large-tool-payload first-token latency reliably exceeds 120s and previously tripped the bug in production. Same setup family as the bug reports above.

  • Exact steps or command run after this patch:

    1. Apply the patch, then pnpm install && pnpm run build.
    2. In ~/.openclaw/config.yaml set models.providers.mininglamp_llm.timeoutSeconds: 600.
    3. openclaw restart, then verify hot-reloaded value via openclaw config get models.providers.mininglamp_llm.timeoutSeconds (expect 600).
    4. From an OpenClaw agent chat with tools.profile: "full" issue a prompt that forces long reasoning + ~50k-token tool payload so first-token latency lands in the 130–200s window (same shape as the failing call in LLM idle timeout cannot be configured for long local llama.cpp requests in 2026.5.3-1 #77744).
    5. Tail runtime logs at ~/.openclaw/logs/agent.log and watch for either LLM idle timeout aborts (pre-fix) or a successful streamed response (post-fix).
    6. Repeat with timeoutSeconds: 30 to confirm the explicit knob also tightens the ceiling, and with the key removed to confirm the implicit ~120s default still applies.
  • Evidence after fix: Redacted runtime logs / console output captured from the OpenClaw agent run on the setup above, plus a focused unit-lane re-run that pins the resolver contract:

    <PLACEHOLDER 1 — paste 5–15 lines from ~/.openclaw/logs/agent.log showing
     (a) the hot-reload line acknowledging models.providers.mininglamp_llm.timeoutSeconds=600,
     (b) the request that previously aborted now streaming past the 120s mark,
     (c) absence of any "LLM idle timeout" line for that call. Redact tokens / org ids.>
    
    $ pnpm test src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
    Test Files  1 passed (1)
         Tests  71 passed (71)
      Duration  713ms
    

    Additional supplemental lanes (not the live evidence, just regression spread):

    $ pnpm test src/agents/pi-embedded-runner/run/
    Test Files  25 passed (25)
         Tests  430 passed (430)
      Duration  10.13s
    
    $ pnpm test src/config/schema.help.quality.test.ts
    Test Files  1 passed (1)
         Tests  22 passed (22)
    
  • Observed result after fix: With timeoutSeconds: 600 set, the agent waits the full configured window instead of aborting at ~120s.

    • Pre-fix on the same prompt (main @ 6fcfeed): aborts at ~120s with Embedded agent failed before reply: All models failed → mininglamp_llm/claude-opus-4-7: LLM idle timeout (120s): no response from model, then falls through the model chain. (This is the exact failure shape reported in LLM idle timeout cannot be configured for long local llama.cpp requests in 2026.5.3-1 #77744 against llamacpp/qwen3.6-35b and in Expose model stream idle-timeout as user config (currently ~120s, hardcoded) #78361 against Gemini preview.)
    • Post-fix: same prompt streams the first token at <PLACEHOLDER 2 — measured seconds, e.g. "~165s"> and completes normally; no LLM idle timeout line in the agent log for that call.
    • Setting timeoutSeconds to a value shorter than 120s (e.g. 30) still aborts at 30s as expected — the explicit knob acts as a ceiling in both directions, matching the documented contract.
    • Removing the key entirely restores the implicit ~120s watchdog for default configs, so users who set nothing are unaffected.
  • What was not tested: Live end-to-end against Gemini preview's silent-reasoning-buffer scenario from Expose model stream idle-timeout as user config (currently ~120s, hardcoded) #78361 and against a llama.cpp 14400s prefill from LLM idle timeout cannot be configured for long local llama.cpp requests in 2026.5.3-1 #77744 — those backends are not provisioned on this host. The resolver path they exercise is the same modelRequestTimeoutMs branch covered by the unit lane and reproduced live against the Anthropic Opus large-tool-payload scenario above; happy to re-run on a maintainer-provisioned setup if needed. Broader pnpm test / pnpm check sweeps were also not run — kept this PR scoped to the resolver + its direct callers.

Compatibility

  • No public API or config schema changes. models.providers.<id>.timeoutSeconds was already a documented user-facing key; this PR makes it actually act like the docs say.
  • Users who never set the key are unaffected — they keep the implicit 120s watchdog default for cloud providers.
  • Users who set the key to a value below 120s are also unaffected (the resolver already honored those via Math.min).
  • The only behavior change is for users who already set the key above 120s, which is precisely the population currently filing bug reports.

Refs

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch makes explicit models.providers.<id>.timeoutSeconds drive the LLM idle watchdog ceiling for cloud providers, updates resolver tests, and refreshes schema help text.

Reproducibility: yes. at source level. Current main converts provider timeoutSeconds to requestTimeoutMs, passes it into resolveLlmIdleTimeoutMs, and then caps non-local non-cron values above 120s at the implicit default.

PR rating
Overall: 🦐 gold shrimp
Proof: 🦪 silver shellfish
Patch quality: 🦞 diamond lobster
Summary: The patch is clean and focused, but proof remains too thin for an external runtime bug-fix PR because the key live evidence is still placeholder text.

Rank-up moves:

  • Replace the placeholder proof with redacted logs or terminal output showing a configured provider timeout above 120s prevents the idle watchdog abort, and redact private endpoints, keys, phone numbers, and other sensitive details.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature, rarity, or ASCII portrait is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

Real behavior proof
Needs stronger real behavior proof before merge: The PR describes a live setup but leaves the actual after-fix runtime logs and measured first-token timing as placeholders, so real behavior proof is not yet usable. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge
Why this matters: - The PR body still has placeholders instead of redacted runtime logs or measured after-fix output, so the real provider behavior is not yet proven.

  • For cloud providers with an explicit timeout above 120s, merging this intentionally delays idle fallback on truly silent or dead streams until the configured ceiling; maintainers should accept that availability tradeoff before merge.

Maintainer options:

  1. Require redacted timeout proof (recommended)
    Ask for logs, terminal output, or a recording showing a configured cloud provider stays active past 120s and is governed by the explicit timeout instead of the implicit watchdog.
  2. Accept explicit-timeout semantics
    Maintainers can intentionally accept that an explicit provider timeout is the operator-chosen silence ceiling for cloud providers while the unset default remains 120s.

Next step before merge
Keep this in maintainer review until the contributor replaces placeholders with redacted live evidence and maintainers accept the explicit cloud-timeout availability tradeoff.

Security
Cleared: The diff only changes timeout resolver logic, focused tests, and schema help; it does not alter dependencies, CI, secrets handling, auth, or supply-chain execution paths.

Review details

Best possible solution:

Land the focused resolver, test, and help-text change after redacted real provider proof shows a configured timeout above 120s prevents the premature idle abort while shorter run or agent timeouts still win.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main converts provider timeoutSeconds to requestTimeoutMs, passes it into resolveLlmIdleTimeoutMs, and then caps non-local non-cron values above 120s at the implicit default.

Is this the best way to solve the issue?

Yes, with a maintainer policy check. The patch changes the single resolver branch fed by provider timeout metadata, keeps unset defaults intact, and preserves lower run or agent timeout bounds; the remaining gap is proof and acceptance of the longer explicit cloud silence window.

Label justifications:

  • P1: The PR targets a reported agent model-call regression where valid long-running provider responses are aborted and fallback routing takes over.
  • merge-risk: 🚨 availability: Changing explicit cloud provider timeouts above 120s can make dead or silent streams wait much longer before fallback, even though that appears intentional.

What I checked:

  • Current resolver clamps explicit cloud provider timeouts: On current main, resolveLlmIdleTimeoutMs accepts modelRequestTimeoutMs, but for non-cron non-local providers it returns clampImplicitTimeoutMs(boundedTimeoutMs), which caps a configured 300s or 600s provider timeout at the implicit 120s watchdog. (src/agents/pi-embedded-runner/run/llm-idle-timeout.ts:151, 1d77170a305b)
  • Provider config reaches the idle resolver: The embedded attempt passes params.model.requestTimeoutMs into resolveLlmIdleTimeoutMs, so provider-level timeout configuration is on the implicated runtime path. (src/agents/pi-embedded-runner/run/attempt.ts:2883, 1d77170a305b)
  • Provider timeout is derived from timeoutSeconds: The model resolver converts positive models.providers.<id>.timeoutSeconds values to requestTimeoutMs, which is the value later passed into the idle-timeout resolver. (src/agents/pi-embedded-runner/model.ts:355, 1d77170a305b)
  • Current tests pin the old cloud clamp: The current test suite explicitly expects a remote provider request timeout of 300s to resolve to DEFAULT_LLM_IDLE_TIMEOUT_MS, matching the reported failure mode and showing why the PR updates this contract. (src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts:46, 1d77170a305b)
  • PR diff targets the right branch: The proposed commit replaces the non-local clampImplicitTimeoutMs branch with clampTimeoutMs, updates the resolver expectation to 300s, and documents the idle-watchdog effect in schema help. (src/agents/pi-embedded-runner/run/llm-idle-timeout.ts:168, bad0aa789675)
  • Real behavior proof is still incomplete: The PR body describes a live Anthropic Opus run, but the evidence section still contains placeholders for the redacted runtime log lines and measured first-token time, so it does not yet show an after-fix provider call staying alive past 120s.

Likely related people:

  • steipete: Authored the policy-alignment commit for LLM idle timeout behavior and appears in blame for the current resolver/test surface. (role: recent area contributor; confidence: high; commits: d9dc75774bcb, 59defa3e7159; files: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts)
  • ImLukeF: Authored the commits aligning idle-timeout defaults and honoring explicit run timeouts, both adjacent to the resolver semantics changed here. (role: recent timeout policy contributor; confidence: high; commits: ddefce3c1868, 7f2814fc4a76; files: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/agents/pi-embedded-runner/run/attempt.ts)
  • Ayaan Zaidi: Committed the original idle-timeout feature and a later unification with the runner abort path, so they are connected to the runtime boundary being changed. (role: adjacent owner by commit history; confidence: medium; commits: 84b72e66b918, 179f713c88c6; files: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/agents/pi-embedded-runner/run/attempt.ts)
  • Liu Yuan: Authored the original LLM idle-timeout streaming feature whose resolver and wrapper are now being adjusted. (role: introduced behavior; confidence: medium; commits: 84b72e66b918; files: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1d77170a305b.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@shakkernerd shakkernerd self-assigned this May 19, 2026
yujiawei and others added 3 commits May 19, 2026 17:15
… cloud providers

The schema.help text for `models.providers.*.timeoutSeconds` documents the
key as the user-facing knob for "slow local or self-hosted model servers".
In practice the option is also the only configurable lever for the LLM
idle/first-token watchdog. However `resolveLlmIdleTimeoutMs` was still
running the explicit provider timeout through `clampImplicitTimeoutMs`,
clamping it back down to the implicit ~120s `DEFAULT_LLM_IDLE_TIMEOUT_MS`
ceiling for any non-cron, non-local provider.

Consequence (matches openclaw#77744 and openclaw#78361):
- User sets `models.providers.llamacpp.timeoutSeconds: 14400` (or 600 for
  a slow Gemini/Opus turn with a large tool payload).
- Hot reload accepts the value, runtime resolves
  `modelRequestTimeoutMs = 14_400_000`.
- Idle watchdog still trips at ~120s with
  "LLM idle timeout (120s): no response from model", aborting an
  otherwise-healthy upstream that is mid-prefill or buffering thinking
  tokens.

Fix: when the caller passes an explicit `modelRequestTimeoutMs`
(sourced from `models.providers.<id>.timeoutSeconds` /
`model.requestTimeoutMs`), treat it as a deliberate ceiling for cloud
providers too. The run-timeout / agent-timeout bounds still apply via
`timeoutBounds`, so a shorter explicit run timeout always wins. The
implicit default watchdog still kicks in when the user has not set a
provider timeout, preserving the network-silence-as-hang guard for
default configs.

Updated the two corresponding test cases that asserted the old
clamp-on-cloud behavior; all 71 tests in `llm-idle-timeout.test.ts`
and the wider 430-test `src/agents/pi-embedded-runner/run/` lane pass.
Schema help text refreshed to call out that the same knob raises the
idle watchdog ceiling.

Refs: openclaw#77744, openclaw#78361
@shakkernerd shakkernerd force-pushed the fix/honor-provider-timeout-in-idle-watchdog branch from db1109f to 0d15804 Compare May 19, 2026 16:16
@shakkernerd shakkernerd merged commit 78d226b into openclaw:main May 19, 2026
23 checks passed
@shakkernerd

Copy link
Copy Markdown
Member

Landed with rebase.

This fixes the provider timeout path so explicit models.providers.<id>.timeoutSeconds values above the default idle watchdog are honored for cloud and self-hosted providers, while unset cloud providers keep the default ~120s watchdog and lower run/agent timeouts still cap the request.

Verification:

  • node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
  • git diff --check

Added a focused regression for self-hosted bare hostnames like http://cerebro-mac:8080/v1 and an Unreleased changelog entry.

Thanks @yujiawei.

eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 21, 2026
…026.5.20) (#615)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.19` → `2026.5.20` |

---

> ⚠️ **Warning**
>
> Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information.

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.5.20`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026520)

[Compare Source](openclaw/openclaw@v2026.5.19...v2026.5.20)

##### Changes

- Exec approvals: remove the old `cat SKILL.md && printf ... && <skill-wrapper>` allowlist compatibility path so skill files must be loaded with the read tool and only the real skill executable is auto-allowed.
- Discord: let voice sessions follow configured Discord users into voice channels, with allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation. ([#&#8203;84264](openclaw/openclaw#84264)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Discord/voice: include bounded `IDENTITY.md`, `USER.md`, and `SOUL.md` profile context in realtime voice session instructions by default, with `voice.realtime.bootstrapContextFiles: []` available to disable it. ([#&#8203;84499](openclaw/openclaw#84499)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Dependencies: bump the bundled Codex harness to `@openai/codex` `0.132.0` and refresh the app-server model-list docs for the new catalog.
- CLI/policy: add the bundled Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. ([#&#8203;80407](openclaw/openclaw#80407)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo).
- Agents/config: allow `agents.list[].experimental.localModelLean` so lean local-model mode can be enabled for one configured agent instead of globally.
- Providers/xAI: add device-code OAuth login so remote and headless setups can authorize xAI without a localhost browser callback. ([#&#8203;84005](openclaw/openclaw#84005)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Providers/OpenRouter: honor provider-level `params.provider` routing policy for OpenRouter requests, with model and agent params overriding the defaults. Thanks [@&#8203;amknight](https://github.com/amknight).

##### Fixes

- CLI/tasks: include stale-running task maintenance decisions in `openclaw tasks maintenance --json` so retained and reconcile candidates explain backing-session, cron, CLI, and wedged-subagent state. ([#&#8203;84691](openclaw/openclaw#84691)) Thanks [@&#8203;efpiva](https://github.com/efpiva).
- Codex app-server: keep system-prompt reports working when bootstrap hooks provide workspace files with only a path and content, so hook-supplied SOUL/IDENTITY/TOOLS/USER context still reports injected characters correctly. ([#&#8203;84736](openclaw/openclaw#84736)) Thanks [@&#8203;JARVIS-Glasses](https://github.com/JARVIS-Glasses).
- Providers/MiniMax music: stop advertising `durationSeconds` control and remove prompt-injected duration hints, so `music_generate` reports MiniMax duration as an unsupported override instead of suggesting MiniMax can enforce track length. Fixes [#&#8203;84508](openclaw/openclaw#84508). Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- Doctor: warn when sandbox tool policy hides configured MCP server tools before provider requests. ([#&#8203;84699](openclaw/openclaw#84699)) Thanks [@&#8203;nxmxbbd](https://github.com/nxmxbbd).
- WhatsApp: update Baileys to `7.0.0-rc12`.
- Build: suppress per-locale `rolldown-plugin-dts:fake-js` CommonJS dts warnings emitted while bundling the intentionally-inlined `zod/v4/locales/*.d.cts` files, so `pnpm build` output stays readable after the 0.25.1 plugin bump. Thanks [@&#8203;romneyda](https://github.com/romneyda).
- CLI/nodes: route lazy plugin-registration logs to stderr for JSON-mode `openclaw nodes` commands so stdout stays parseable. ([#&#8203;84684](openclaw/openclaw#84684)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Approvals: route manual `/approve` decisions through the trusted approval runtime so active exec and plugin approvals no longer look unknown or expired.
- Mac app: update the About settings copyright year to 2026. ([#&#8203;84385](openclaw/openclaw#84385)) Thanks [@&#8203;pejmanjohn](https://github.com/pejmanjohn).
- Dependencies: update `@openclaw/fs-safe` to `0.2.7` so OpenClaw's default Python-helper-off policy keeps best-effort Node write fallbacks for private stores, secret writes, run logs, and media attachments on Linux/macOS.
- Infra/secrets: restore the fail-closed contract for `tryReadSecretFileSync` so credential loaders that pass `rejectSymlink: true` (Telegram, LINE, Zalo, IRC, Nextcloud Talk tokens) refuse symlinked credential files instead of silently accepting them, and the infra-state CI shard's secret-file symlink test passes again. Thanks [@&#8203;romneyda](https://github.com/romneyda).
- Browser: honor the configured image sanitization limit for screenshots and labeled snapshots so browser-captured images follow the same resize policy as other image results. ([#&#8203;84595](openclaw/openclaw#84595))
- Doctor: remove unrecognized `models.providers.*.models[*].compat.thinkingFormat` values during `doctor --fix` so stale provider model config can validate after upgrade. Fixes [#&#8203;77803](openclaw/openclaw#77803).
- Doctor: warn when `openclaw.json` stores plaintext secret-bearing config fields, including model provider API keys and sensitive provider headers. ([#&#8203;84718](openclaw/openclaw#84718)) Thanks [@&#8203;lukaIvanic](https://github.com/lukaIvanic).
- Status: show the configured default, session-selected model, reason, clear hint, and docs link when a session remains pinned to a model that differs from `agents.defaults.model.primary`.
- WebChat: clear stale typing indicators when session change events mark the active chat run complete.
- Mac app: keep local packaging signed with a stable app identity for permission testing and fix Control UI production builds under current Vite/Highlight.js exports.
- macOS app: update the embedded Peekaboo bridge to 3.2.1 so OpenClaw-hosted UI automation works with current Peekaboo CLI capture flows.
- Cron: deliver preferred final assistant output for successful scheduled runs when trailing plain tool warnings remain in diagnostics instead of marking the run failed.
- fix(mattermost): fail closed on missing channel type \[AI]. ([#&#8203;84091](openclaw/openclaw#84091)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Recheck rebuilt system.run argv \[AI]. ([#&#8203;84090](openclaw/openclaw#84090)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- CLI: keep the private QA subcommand out of exported command descriptors unless `OPENCLAW_ENABLE_PRIVATE_QA_CLI=1`, so root help and subcommand markers match runtime registration. ([#&#8203;84519](openclaw/openclaw#84519))
- CLI/cron: bound `openclaw cron show` job lookup pagination so non-advancing or unbounded `cron.list` responses fail instead of hanging the command. Fixes [#&#8203;83856](openclaw/openclaw#83856). ([#&#8203;83989](openclaw/openclaw#83989))
- Agents/messages: stop message-tool-only turns after a successful source-channel `message` send while keeping transcript mirrors under the session write lock. ([#&#8203;84289](openclaw/openclaw#84289))
- Agents: filter silent heartbeat response-tool transcript artifacts out of embedded context snapshots so later user turns are not polluted by heartbeat no-op messages. ([#&#8203;83477](openclaw/openclaw#83477)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Agents/OpenAI: log repeated strict tool-schema downgrade diagnostics once per provider/model/tool signature, reducing duplicate debug noise while preserving `strict=false` fallback behavior. Fixes [#&#8203;82930](openclaw/openclaw#82930). ([#&#8203;82933](openclaw/openclaw#82933)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/code mode: spell out the `exec` tool's JavaScript/TypeScript, no Node module, and catalog-bridge constraints in model-visible schema text so agents can use enabled tools without trial-and-error. ([#&#8203;84269](openclaw/openclaw#84269)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Codex: give `image_generate` dynamic-tool calls a 120s default watchdog when no per-call or configured image timeout is set, so image generation no longer falls back to the generic 30s bridge timeout. ([#&#8203;84254](openclaw/openclaw#84254)) Thanks [@&#8203;moritzmmayerhofer](https://github.com/moritzmmayerhofer).
- Codex: avoid duplicate dynamic tool terminal diagnostics while large diagnostic backlogs drain without blocking tool responses. ([#&#8203;82937](openclaw/openclaw#82937)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- CLI/message: include a stable top-level `messageId` in `openclaw message --json` output when channel sends return one. ([#&#8203;84191](openclaw/openclaw#84191)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- Cron: preserve legacy top-level array `jobs.json` stores when loading or adding scheduled jobs so old cron jobs are no longer treated as an empty store during upgrade. Fixes [#&#8203;60799](openclaw/openclaw#60799). ([#&#8203;84433](openclaw/openclaw#84433)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Gateway/agents: use an agent's `identity.name` in Gateway agent summaries when `agents.list[].name` is unset, so configured agent labels remain visible in clients. ([#&#8203;84355](openclaw/openclaw#84355); refs [#&#8203;57835](openclaw/openclaw#57835)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).
- Channels/replies: keep normal `/verbose` failed-tool progress compact in message-tool replies and prevent late text-only tool output from appearing after the final answer. ([#&#8203;84303](openclaw/openclaw#84303)) Thanks [@&#8203;VACInc](https://github.com/VACInc).
- Plugins/hooks: apply a default 30-second timeout to `before_compaction` and `after_compaction` hooks so a hung plugin handler no longer blocks compaction completion. ([#&#8203;84153](openclaw/openclaw#84153))
- Discord: preserve disabled presentation buttons when adapting and rendering Discord message controls. ([#&#8203;84188](openclaw/openclaw#84188)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- Twitch: add a test-only client-manager registry reset helper so non-isolated Twitch tests can clear cached managers between cases. Fixes [#&#8203;83887](openclaw/openclaw#83887). ([#&#8203;84244](openclaw/openclaw#84244)) Thanks [@&#8203;hclsys](https://github.com/hclsys).
- Cron: run main-session scheduled work on a cron-owned wake lane while preserving reply delivery context, so background cron turns no longer block human main-session chat. Fixes [#&#8203;82766](openclaw/openclaw#82766). ([#&#8203;82767](openclaw/openclaw#82767)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Cron: use structured embedded-run denial metadata for isolated scheduled tasks so blocked exec requests fail the job without treating ordinary assistant prose as a denial. ([#&#8203;84067](openclaw/openclaw#84067)) Thanks [@&#8203;abnershang](https://github.com/abnershang).
- Cron: keep recovered tool warnings diagnostic for successful scheduled runs so final cron output is delivered instead of being replaced by a post-processing warning. ([#&#8203;84045](openclaw/openclaw#84045)) Thanks [@&#8203;abnershang](https://github.com/abnershang).
- Plugins/perf: thread explicit plugin discovery results through `loadBundledCapabilityRuntimeRegistry`, `resolveBundledPluginSources`, and `listChannelCatalogEntries` so callers that already hold a discovery result skip redundant filesystem walks. Thanks [@&#8203;SebTardif](https://github.com/SebTardif).
- harden update restart script creation \[AI]. ([#&#8203;84088](openclaw/openclaw#84088)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Docker: keep the bundled Codex plugin in official release image keep lists so the default OpenAI agent harness remains available after Docker pruning. Fixes [#&#8203;83613](openclaw/openclaw#83613). ([#&#8203;83626](openclaw/openclaw#83626)) Thanks [@&#8203;YuanHanzhong](https://github.com/YuanHanzhong).
- CLI/channels: preserve the first line of `openclaw channels logs` output when the rolling tail window starts exactly on a line boundary, mirroring the already-fixed `readLogSlice` behavior in `src/logging/log-tail.ts`.
- Control UI: treat terminal session status as authoritative over stale active-run flags so completed terminal runs stop showing abort/live UI. ([#&#8203;84057](openclaw/openclaw#84057))
- CLI: preserve embedded equals signs in inline root option values instead of truncating after the second separator. ([#&#8203;83995](openclaw/openclaw#83995)) Thanks [@&#8203;ThiagoCAltoe](https://github.com/ThiagoCAltoe).
- Matrix/config: accept `messages.queue.byChannel.matrix` queue overrides and keep queue provider schema/type keys aligned for Matrix, Google Chat, and Mattermost. Thanks [@&#8203;bdjben](https://github.com/bdjben).
- CLI: format `openclaw acp client` failures through the shared error formatter so object-shaped errors stay readable instead of printing `[object Object]`. Fixes [#&#8203;83904](openclaw/openclaw#83904). ([#&#8203;84080](openclaw/openclaw#84080))
- Providers/Ollama: default unknown-capabilities models to tool-capable so discovered native Ollama models can use tools when `/api/show` omits capabilities. ([#&#8203;84055](openclaw/openclaw#84055)) Thanks [@&#8203;dutifulbob](https://github.com/dutifulbob).
- Installer/Windows: launch `install.ps1` onboarding as an attached child process so fresh native Windows installs do not freeze visibly at `Starting setup...` or corrupt the wizard's terminal rendering.
- CLI/update: keep restart health checks working across one-version CLI/Gateway protocol skew and use the managed Gateway service Node for all follow-up commands even when the package root is unchanged, so `openclaw update` no longer silently switches the gateway to a different Node binary when multiple Node installations are present. Thanks [@&#8203;amknight](https://github.com/amknight).
- CLI/gateway: include the running Gateway version in `gateway status` JSON output, preserving existing server metadata while falling back to status RPC data for read probes. Fixes [#&#8203;56222](openclaw/openclaw#56222). Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Memory/search: close local embedding providers when active-memory searches time out so pending local model loads and embedding contexts are aborted and released. ([#&#8203;83858](openclaw/openclaw#83858)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).
- CLI/nodes: request pending node surface approval scopes before `openclaw nodes approve` so exec-capable node approval can use admin-scoped Gateway credentials instead of failing with `missing scope: operator.admin`. ([#&#8203;84392](openclaw/openclaw#84392)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Gateway: reject slow node event sends before outbound buffers grow unbounded and log the rejected payload diagnostic. ([#&#8203;84387](openclaw/openclaw#84387)) Thanks [@&#8203;samzong](https://github.com/samzong).
- Agents: include bounded trajectory queued-writer diagnostics in `pi-trajectory-flush` timeout warnings so flush stalls show pending writes, queued bytes, and append state. Fixes [#&#8203;82961](openclaw/openclaw#82961). ([#&#8203;82962](openclaw/openclaw#82962)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/subagents: recover stale completion announces by retrying unsupported transcript-wait wakes without transcript waiting and forcing a message-tool handoff when the requester run is already stale. Fixes [#&#8203;83699](openclaw/openclaw#83699). ([#&#8203;83700](openclaw/openclaw#83700)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/subagents: constrain wildcard subagent target allowlists to configured agents while preserving explicitly listed compatibility targets. Fixes [#&#8203;84040](openclaw/openclaw#84040). ([#&#8203;84357](openclaw/openclaw#84357)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Providers/Anthropic: route Anthropic model refs selected with Claude CLI auth through the Claude CLI runtime so shorthand refs such as `anthropic/opus-4.7` no longer fall back to embedded Anthropic billing. Fixes [#&#8203;84222](openclaw/openclaw#84222). ([#&#8203;84374](openclaw/openclaw#84374)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Agents: honor explicit `models.providers.<id>.timeoutSeconds` values above the default idle watchdog for cloud and self-hosted providers, so long first-token waits no longer fall back at \~120s when the provider timeout is higher. ([#&#8203;83979](openclaw/openclaw#83979)) Thanks [@&#8203;yujiawei](https://github.com/yujiawei).
- Agents/Codex: keep encrypted Responses reasoning replay provenance-bound so stale mirrored Codex transcripts drop invalid encrypted content before request assembly while preserving matching same-session replay. Fixes [#&#8203;83836](openclaw/openclaw#83836). ([#&#8203;84367](openclaw/openclaw#84367)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Agents/subagents: skip stale embedded-run wake probes for dormant completion requesters, so late subagent completions go straight to requester-agent/direct handoff instead of producing `reason=no_active_run` queue noise. ([#&#8203;82964](openclaw/openclaw#82964)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- CLI: retry config snapshot reads after a transient failure so one rejected read no longer poisons later commands in the same process. ([#&#8203;83931](openclaw/openclaw#83931)) Thanks [@&#8203;honor2030](https://github.com/honor2030).
- Media: decode URL path basenames before using them as remote media fallback filenames, so files like `My%20Report.pdf` are surfaced as `My Report.pdf`. Fixes [#&#8203;84050](openclaw/openclaw#84050). ([#&#8203;84052](openclaw/openclaw#84052)) Thanks [@&#8203;jbetala7](https://github.com/jbetala7).
- WhatsApp: clarify inbound group diagnostics so observed but unregistered groups point to `channels.whatsapp.groups` without changing routing or sender authorization. ([#&#8203;83846](openclaw/openclaw#83846)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- WhatsApp: drain pending outbound deliveries on a 30s periodic timer in addition to the reconnect handler, so messages enqueued while the provider is already connected no longer wait for the next reconnect to send. ([#&#8203;79083](openclaw/openclaw#79083)) Thanks [@&#8203;Oviemudiaga](https://github.com/Oviemudiaga).
- CLI/TUI: include gateway plugin slash commands in TUI autocomplete, so connected sessions can suggest plugin-owned commands exposed by the running Gateway. ([#&#8203;83640](openclaw/openclaw#83640)) Thanks [@&#8203;se7en-agent](https://github.com/se7en-agent).
- Gateway/mobile: restore QR setup-code handoff of bounded operator tokens for iOS and Android onboarding while keeping admin and pairing scopes out of bootstrap. ([#&#8203;83684](openclaw/openclaw#83684)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- iOS: repair Release archive compilation for the TestFlight build. ([#&#8203;84255](openclaw/openclaw#84255)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- Agents/compaction: bound plugin-owned CLI transcript compaction with the host safety timeout so a hung context engine can no longer stall post-turn cleanup. ([#&#8203;84083](openclaw/openclaw#84083)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Control UI/usage: truncate long context skill, tool, and file names in the usage panel while keeping the full name available on hover. ([#&#8203;42197](openclaw/openclaw#42197)) Thanks [@&#8203;Rain120](https://github.com/Rain120).
- Codex: respect explicit `models auth order set` and `config.auth.order` precedence over stale `lastGood` in `/codex account`, and show `no working credential` when every explicit-order profile is ineligible instead of marking a lower-ranked profile as active. Fixes [#&#8203;84386](openclaw/openclaw#84386). ([#&#8203;84412](openclaw/openclaw#84412)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Agents: honor `messages.suppressToolErrors` for mutating tool failures so configured chat surfaces do not receive separate warning payloads. ([#&#8203;81561](openclaw/openclaw#81561)) Thanks [@&#8203;moeedahmed](https://github.com/moeedahmed).
- Agents/fallback: surface billing guidance for mixed rate-limit plus billing fallback exhaustion instead of generic failure copy. Fixes [#&#8203;79396](openclaw/openclaw#79396). ([#&#8203;79489](openclaw/openclaw#79489)) Thanks [@&#8203;aayushprsingh](https://github.com/aayushprsingh).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/615
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: XS status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants