Skip to content

fix(code-mode): sharpen exec tool description so models stop wasting turns rediscovering constraints#84269

Closed
Kaspre wants to merge 2 commits into
openclaw:mainfrom
Kaspre:fix/code-mode-exec-tool-description
Closed

fix(code-mode): sharpen exec tool description so models stop wasting turns rediscovering constraints#84269
Kaspre wants to merge 2 commits into
openclaw:mainfrom
Kaspre:fix/code-mode-exec-tool-description

Conversation

@Kaspre

@Kaspre Kaspre commented May 19, 2026

Copy link
Copy Markdown
Contributor

Problem

When OpenClaw code mode is enabled, the exec tool's description tells the model to use tools.search / describe / call to call enabled tools but does not state any of the runtime's actual constraints. In observed agent traces, the model burns turns rediscovering them:

  • It tries language: "bash" because "code mode" reads as "any code"; gets rejected.
  • It writes require('fs') (or another Node built-in) to do I/O; gets "code mode module access is disabled." and loops trying simpler constructs (let x = 1, bare expression) to figure out which patterns are allowed.
  • It rediscovers, turn by turn, that Node modules are unavailable and that the catalog bridge is the only path to I/O.

Change and value

Sharpen the description so the model sees the constraints up front:

  • Node modules and require / import are NOT available.
  • The language field accepts only "javascript" or "typescript".
  • The catalog bridge is the path for shell, file, network, or external action: tools.search(query) finds enabled catalog entries, then code should pass entry.id to tools.describe(entry.id) and tools.call(entry.id, args).
  • Catalog calls still go only through enabled tools allowed by existing policy.

Parameter descriptions also state which symbols are in scope (tools, ALL_TOOLS) versus what is not (Node built-ins).

No runtime behavior changes — only the tool-card text the model sees.

Who's affected

All agents with code mode enabled (globally or per-agent), regardless of provider. Most directly: models that don't already know the catalog-bridge contract from training-set examples — typically smaller open-weight models routed through the Pi-embedded-runner.

Why now

Per-agent code mode shipped in 2026.5.18 (#83473). A controlled re-enable for evaluation surfaced sessions where the agent spent 6+ turns probing the same constraints before stabilizing on the bridge pattern. Description-only fixes are the cheapest path to reduce that floor; the rest of the friction is structural and will need separate work.

Implementation

src/agents/code-mode.tscreateCodeModeTools:

  • exec tool description expanded with the three constraints.
  • Bridge guidance uses the documented id-based flow: search for entries, then call describe / call with entry.id.
  • code parameter description states scope explicitly (tools and ALL_TOOLS in scope; Node built-ins not).
  • language parameter description states the enum.

src/agents/code-mode.test.ts:

  • Adds regression coverage that the model-visible exec description and parameter descriptions keep those runtime constraints visible.

Real behavior proof

  • Behavior or issue addressed: When code mode is enabled, models routed through the Pi-embedded-runner can waste turns rediscovering that language: "bash" is rejected, that require / import is unavailable, and that the tools.search / describe / call bridge is the path to I/O. The current exec description does not state any of those constraints; agents must discover them by trial and error at runtime.

  • Real environment tested: OpenClaw 2026.5.18 stable (commit 50a2481); agent routed to ollama-cloud/kimi-k2.6 via the Pi-embedded-runner; tools.codeMode.enabled: true (per-agent override per fix(code-mode): honor agent scoped code mode #83473 schema). Same agent, same model, same prompt across before/after for the live delivery proof. Final-head schema readback was run from this PR branch at e75ca8e7bb34dfc76a6e6dc23397e571353a09e2. CI-shaped validation was run on the PR merge ref in Crabbox/GCP.

  • Exact steps or command run after this patch:

    # Live delivery proof — same prompt before and after patching the running dist:
    PROBE='Without calling any tools, recite the description text of the `exec` tool exactly as it appears in your tool definitions. Quote it verbatim and reply with just that text.'
    
    openclaw agent --local --agent <pi-runtime-codemode-agent> --timeout 120 \
      --model ollama-cloud/kimi-k2.6 --thinking low \
      --session-id "cm-rbp-<phase>-$(date +%s)" \
      -m "$PROBE" --json
    
    # Final-head schema readback at e75ca8e7bb:
    node --import tsx -e 'import { createCodeModeTools } from "./src/agents/code-mode.ts"; import { createToolSearchCatalogRef } from "./src/agents/tool-search.ts"; const config = { tools: { codeMode: true } }; const [execTool] = createCodeModeTools({ config, runtimeConfig: config, sessionId: "schema-proof", sessionKey: "agent:main:main", runId: "schema-proof", catalogRef: createToolSearchCatalogRef() }); console.log(execTool.description);'
    
    # Remote merge-ref validation in Crabbox/GCP:
    pnpm install --frozen-lockfile --reporter=append-only
    pnpm build
    pnpm tsgo:core:test
    pnpm check:changed
  • Evidence after fix: Live agent dispatches (recitation probe, fresh sessions) at 2026-05-19T20:28Z (before-patch dist) and 2026-05-19T20:42Z (after-patch dist) returned the following final payloads.

    Before (main dist; agent reads OLD description):

    Run JavaScript or TypeScript in OpenClaw code mode. Use ALL_TOOLS and tools.search/describe/call inside the code to discover and call enabled tools.
    

    No statement that require / import is blocked. No statement that language rejects values other than javascript / typescript. The model has to discover these by trying them.

    Live delivery after initial patch (dist patched to this PR's description block; agent reads NEW description):

    Run JavaScript or TypeScript in OpenClaw code mode. Node.js modules and `require`/`import` are NOT available — for any shell, file, network, or external action, use the catalog bridge from inside your code: `tools.search(query)` to find tools, `tools.describe(name)` for the input schema, then `tools.call(name, args)`. The `language` field accepts only "javascript" or "typescript"; do not pass "bash", "shell", or other values.
    

    The live recitation proved the model receives the model-visible description unchanged.

    Final head schema readback (e75ca8e7bb; source code path that builds the model-visible tool schema):

    Run JavaScript or TypeScript in OpenClaw code mode. Node.js modules and `require`/`import` are NOT available — for any shell, file, network, or external action, use enabled catalog tools allowed by policy from inside your code: `tools.search(query)` to find catalog entries, `tools.describe(entry.id)` for the input schema, then `tools.call(entry.id, args)`. The `language` field accepts only "javascript" or "typescript"; do not pass "bash", "shell", or other values.
    

    Final-head source assertions also checked the code parameter mentions tools, ALL_TOOLS, and unavailable Node built-ins, and that language says it must be "javascript" or "typescript".

    Remote merge-ref validation (Crabbox/GCP, us-east1-b, PR merge ref containing e75ca8e7bb): pnpm install --frozen-lockfile --reporter=append-only, pnpm build, pnpm tsgo:core:test, and pnpm check:changed all exited 0. pnpm check:changed selected the core and coreTests lanes, typechecked both, linted 8652 files with 217 rules using 1 thread, and reported 0 warnings / 0 errors. Cleanup check found no remaining crabbox-* GCP instances.

  • Observed result after fix: The live recitation proves the generated exec description reaches the model as its tool-card text. The final PR head now emits the safer id-based bridge wording from the same createCodeModeTools schema path, so agents see the constraints up front without being steered toward ambiguous name-based describe / call handles.

  • What was not tested: A multi-trial controlled A/B measuring the rate-of-rediscovery reduction (turns spent attempting language: "bash", require(), etc., before stabilizing on the bridge pattern). The proof confirms delivery and final schema text; behavioral impact would need many trials on a model that exhibits the rediscovery loop reliably (an earlier observed instance was a real session that looped 6+ turns on code mode module access is disabled after attempting require('fs')).

…turns rediscovering constraints

When tools.codeMode.enabled is true, the exec tool's description tells the
model to use tools.search/describe/call to call enabled tools, but does not
state any of the runtime's actual constraints. Observed traces show models
waste turns rediscovering them: trying language: "bash", writing require()
for I/O, and probing scope by trial and error.

This change tells the model up front:
- Node modules and require/import are NOT available
- language accepts only "javascript" or "typescript"
- the tools.search → describe → call bridge is the path to any shell,
  file, network, or external action
- which symbols are in scope (tools, ALL_TOOLS) versus what is not

No runtime behavior changes — only the tool-card description strings.
Snapshot tests do not reference either old string (verified via grep).
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR expands the model-visible code-mode exec, code, and language schema descriptions and adds a regression test asserting those runtime constraints stay visible.

Reproducibility: yes. for the source-level tool-card gap: current main's createCodeModeTools description lacks constraints that readCode, prepareSource, the worker bridge, and docs already define. No high-confidence reproduction was established for the broader statistical turn-saving claim.

PR rating
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Summary: Good normal PR quality: small aligned patch, sufficient live-output proof, focused regression coverage, and no blocking findings.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
✨ Hatched: 🌱 uncommon Frosted Patch Peep

       /\  .---.  /\         
      /  \/     \/  \        
     /   ( -   - )   \       
    |       ._.       |      
    |   /|  ===  |\   |      
     \  \|______/|/  /       
      '._  `--'  _.'         
         '-.__.-'            
       _/|_|  |_|\_          
      /__|      |__\         
       .-----------.         
      '-------------'        

Rarity: 🌱 uncommon.
Trait: sparkles near resolved comments.
Image traits: location proof lagoon; accessory green check lantern; palette charcoal, cyan, and signal green; mood calm; pose standing beside its cracked shell; shell translucent glimmer shell; lighting bright celebratory glints; background tiny artifact crates.
How to hatch it: once this PR reaches status: 👀 ready for maintainer look or status: 🚀 automerge armed, the PR author or a maintainer can comment @clawsweeper hatch to turn this ASCII egg into its generated creature image.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Frosted Patch Peep in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchable usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Real behavior proof
Sufficient (live_output): The PR body provides before/after live agent recitation output plus final-head schema readback proving the changed description reaches the model-visible tool schema.

Risk before merge
Why this matters: - The proof demonstrates delivery of the new tool-card text, but not a controlled multi-trial reduction in wasted model turns; the efficiency impact remains inferential.

Maintainer options:

  1. Decide the mitigation before merge
    Land the narrow schema-copy and regression-test change if maintainers accept the wording, leaving quantified model-behavior measurement to follow-up work.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
No repair lane is needed; there are no blocking review findings and the remaining action is ordinary maintainer review and merge workflow.

Security
Cleared: The diff only changes model-facing description strings and a colocated test; it adds no dependencies, workflow changes, secret handling, or new code execution path.

Review details

Best possible solution:

Land the narrow schema-copy and regression-test change if maintainers accept the wording, leaving quantified model-behavior measurement to follow-up work.

Do we have a high-confidence way to reproduce the issue?

Yes for the source-level tool-card gap: current main's createCodeModeTools description lacks constraints that readCode, prepareSource, the worker bridge, and docs already define. No high-confidence reproduction was established for the broader statistical turn-saving claim.

Is this the best way to solve the issue?

Yes; changing the model-visible schema text and adding a focused regression test is the narrowest maintainable solution because the runtime enforcement and documentation already exist.

Label justifications:

  • P3: This is a low-risk code-mode ergonomics and test improvement with no runtime behavior or compatibility change.

What I checked:

  • Current main schema gap: Current main's exec tool description only tells models to use ALL_TOOLS and tools.search/describe/call; it does not mention unavailable Node modules, rejected shell languages, or id-based bridge use. (src/agents/code-mode.ts:829, a059309a9f9a)
  • Runtime language and module constraints: readCode rejects non-javascript/typescript language values and prepareSource rejects import/require patterns with code mode module access is disabled. (src/agents/code-mode.ts:299, a059309a9f9a)
  • Bridge contract matches the proposed wording: The QuickJS worker exposes tools.search, tools.describe, tools.call, and ALL_TOOLS, while docs describe the id-based tools.describe(files[0].id) and tools.call(fileRead.id, ...) flow. (src/agents/code-mode.worker.ts:160, a059309a9f9a)
  • PR head implementation: The PR head updates the schema description to state unavailable Node modules, the allowed language enum, and the tools.search to tools.describe(entry.id) to tools.call(entry.id, args) bridge path. (src/agents/code-mode.ts:829, e75ca8e7bb34)
  • PR head test coverage: The added test asserts the model-visible exec and parameter descriptions continue to mention Node-module unavailability, tools, ALL_TOOLS, id-based describe/call, and the language enum. (src/agents/code-mode.test.ts:248, e75ca8e7bb34)
  • Real behavior proof in PR body: The PR body includes before/after live agent recitation output showing the patched tool description reaches the model, final-head schema readback at e75ca8e7bb34dfc76a6e6dc23397e571353a09e2, and remote Crabbox/GCP validation for install, build, tsgo:core:test, and check:changed. (e75ca8e7bb34)

Likely related people:

  • Galin Iliev: git blame on the current createCodeModeTools schema and nearby tests points to the commit that added the current code-mode files in this checkout. (role: introduced/current implementation history; confidence: medium; commits: 57ec361682e2; files: src/agents/code-mode.ts, src/agents/code-mode.test.ts)
  • Kaspre: The provided GitHub context shows Kaspre authored the recently merged per-agent code-mode PR that this wording change builds on, so they have recent domain context beyond this proposal. (role: recent code-mode contributor; confidence: medium; commits: fd8877b5fde3; files: src/agents/code-mode.ts, src/config/schema.help.ts, docs/reference/code-mode.md)
  • Peter Steinberger: git shortlog over the sampled agent/code-mode/embedded-runner paths shows Peter as the largest adjacent contributor, and the latest stable release commit is also under this area history. (role: adjacent area contributor; confidence: medium; commits: 50a2481652b6; files: src/agents/code-mode.ts, src/agents/tool-search.ts, src/agents/pi-embedded-runner/run/attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a059309a9f9a.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@Kaspre Kaspre force-pushed the fix/code-mode-exec-tool-description branch from b47ee0e to 66b2dd6 Compare May 19, 2026 21:22
@Kaspre Kaspre marked this pull request as ready for review May 19, 2026 21:31
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@Kaspre Kaspre force-pushed the fix/code-mode-exec-tool-description branch from 66b2dd6 to e75ca8e Compare May 19, 2026 22:05
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🔧
ClawSweeper automerge is enabled.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-19 23:51:40 UTC review queued e75ca8e7bb34 (queued)

@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper hatch

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper PR egg hatch requested.

I queued a comment sync for this PR. If the egg is hatchable, ClawSweeper will generate the image once and update the existing review comment.
Action: PR egg hatch queued (workflow sweep.yml, event repository_dispatch).
The ASCII egg stays as the fallback.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Frosted Patch Peep

Hatched PR egg: 🌱 uncommon Frosted Patch Peep

       _..------.._          
    .-'  .-.  .-.  '-.       
   /    ( * )( * )    \      
  |        .--.        |     
  |   <\   ====   />   |     
   \    '.______.'    /      
    '-._   ____   _.-'       
        `-.____.-'           
       __/|_||_|\__          
      /__.'    '.__\         
       .-----------.         
      '-------------'        

Rarity: 🌱 uncommon.
Trait: sparkles near resolved comments.
Image traits: location proof lagoon; accessory green check lantern; palette charcoal, cyan, and signal green; mood calm; pose standing beside its cracked shell; shell translucent glimmer shell; lighting bright celebratory glints; background tiny artifact crates.
How to hatch it: once this PR reaches status: 👀 ready for maintainer look or status: 🚀 automerge armed, the PR author or a maintainer can comment @clawsweeper hatch to turn this ASCII egg into its generated creature image.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Frosted Patch Peep in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchable usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper 🐠 reef update

Thanks for the work here. ClawSweeper could not write to the source branch, so it opened a replacement PR rather than letting the fix drift. attribution still points back here.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #84368
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
This closeout is intentional for this run: the replacement PR is now the active review lane.
The replacement PR carries the original credit trail forward.
Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against ae25826.

@clawsweeper clawsweeper Bot closed this May 20, 2026
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 21, 2026
…026.5.20) (#615)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.19` → `2026.5.20` |

---

> ⚠️ **Warning**
>
> Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information.

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.5.20`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026520)

[Compare Source](openclaw/openclaw@v2026.5.19...v2026.5.20)

##### Changes

- Exec approvals: remove the old `cat SKILL.md && printf ... && <skill-wrapper>` allowlist compatibility path so skill files must be loaded with the read tool and only the real skill executable is auto-allowed.
- Discord: let voice sessions follow configured Discord users into voice channels, with allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation. ([#&#8203;84264](openclaw/openclaw#84264)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Discord/voice: include bounded `IDENTITY.md`, `USER.md`, and `SOUL.md` profile context in realtime voice session instructions by default, with `voice.realtime.bootstrapContextFiles: []` available to disable it. ([#&#8203;84499](openclaw/openclaw#84499)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Dependencies: bump the bundled Codex harness to `@openai/codex` `0.132.0` and refresh the app-server model-list docs for the new catalog.
- CLI/policy: add the bundled Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. ([#&#8203;80407](openclaw/openclaw#80407)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo).
- Agents/config: allow `agents.list[].experimental.localModelLean` so lean local-model mode can be enabled for one configured agent instead of globally.
- Providers/xAI: add device-code OAuth login so remote and headless setups can authorize xAI without a localhost browser callback. ([#&#8203;84005](openclaw/openclaw#84005)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Providers/OpenRouter: honor provider-level `params.provider` routing policy for OpenRouter requests, with model and agent params overriding the defaults. Thanks [@&#8203;amknight](https://github.com/amknight).

##### Fixes

- CLI/tasks: include stale-running task maintenance decisions in `openclaw tasks maintenance --json` so retained and reconcile candidates explain backing-session, cron, CLI, and wedged-subagent state. ([#&#8203;84691](openclaw/openclaw#84691)) Thanks [@&#8203;efpiva](https://github.com/efpiva).
- Codex app-server: keep system-prompt reports working when bootstrap hooks provide workspace files with only a path and content, so hook-supplied SOUL/IDENTITY/TOOLS/USER context still reports injected characters correctly. ([#&#8203;84736](openclaw/openclaw#84736)) Thanks [@&#8203;JARVIS-Glasses](https://github.com/JARVIS-Glasses).
- Providers/MiniMax music: stop advertising `durationSeconds` control and remove prompt-injected duration hints, so `music_generate` reports MiniMax duration as an unsupported override instead of suggesting MiniMax can enforce track length. Fixes [#&#8203;84508](openclaw/openclaw#84508). Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- Doctor: warn when sandbox tool policy hides configured MCP server tools before provider requests. ([#&#8203;84699](openclaw/openclaw#84699)) Thanks [@&#8203;nxmxbbd](https://github.com/nxmxbbd).
- WhatsApp: update Baileys to `7.0.0-rc12`.
- Build: suppress per-locale `rolldown-plugin-dts:fake-js` CommonJS dts warnings emitted while bundling the intentionally-inlined `zod/v4/locales/*.d.cts` files, so `pnpm build` output stays readable after the 0.25.1 plugin bump. Thanks [@&#8203;romneyda](https://github.com/romneyda).
- CLI/nodes: route lazy plugin-registration logs to stderr for JSON-mode `openclaw nodes` commands so stdout stays parseable. ([#&#8203;84684](openclaw/openclaw#84684)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Approvals: route manual `/approve` decisions through the trusted approval runtime so active exec and plugin approvals no longer look unknown or expired.
- Mac app: update the About settings copyright year to 2026. ([#&#8203;84385](openclaw/openclaw#84385)) Thanks [@&#8203;pejmanjohn](https://github.com/pejmanjohn).
- Dependencies: update `@openclaw/fs-safe` to `0.2.7` so OpenClaw's default Python-helper-off policy keeps best-effort Node write fallbacks for private stores, secret writes, run logs, and media attachments on Linux/macOS.
- Infra/secrets: restore the fail-closed contract for `tryReadSecretFileSync` so credential loaders that pass `rejectSymlink: true` (Telegram, LINE, Zalo, IRC, Nextcloud Talk tokens) refuse symlinked credential files instead of silently accepting them, and the infra-state CI shard's secret-file symlink test passes again. Thanks [@&#8203;romneyda](https://github.com/romneyda).
- Browser: honor the configured image sanitization limit for screenshots and labeled snapshots so browser-captured images follow the same resize policy as other image results. ([#&#8203;84595](openclaw/openclaw#84595))
- Doctor: remove unrecognized `models.providers.*.models[*].compat.thinkingFormat` values during `doctor --fix` so stale provider model config can validate after upgrade. Fixes [#&#8203;77803](openclaw/openclaw#77803).
- Doctor: warn when `openclaw.json` stores plaintext secret-bearing config fields, including model provider API keys and sensitive provider headers. ([#&#8203;84718](openclaw/openclaw#84718)) Thanks [@&#8203;lukaIvanic](https://github.com/lukaIvanic).
- Status: show the configured default, session-selected model, reason, clear hint, and docs link when a session remains pinned to a model that differs from `agents.defaults.model.primary`.
- WebChat: clear stale typing indicators when session change events mark the active chat run complete.
- Mac app: keep local packaging signed with a stable app identity for permission testing and fix Control UI production builds under current Vite/Highlight.js exports.
- macOS app: update the embedded Peekaboo bridge to 3.2.1 so OpenClaw-hosted UI automation works with current Peekaboo CLI capture flows.
- Cron: deliver preferred final assistant output for successful scheduled runs when trailing plain tool warnings remain in diagnostics instead of marking the run failed.
- fix(mattermost): fail closed on missing channel type \[AI]. ([#&#8203;84091](openclaw/openclaw#84091)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Recheck rebuilt system.run argv \[AI]. ([#&#8203;84090](openclaw/openclaw#84090)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- CLI: keep the private QA subcommand out of exported command descriptors unless `OPENCLAW_ENABLE_PRIVATE_QA_CLI=1`, so root help and subcommand markers match runtime registration. ([#&#8203;84519](openclaw/openclaw#84519))
- CLI/cron: bound `openclaw cron show` job lookup pagination so non-advancing or unbounded `cron.list` responses fail instead of hanging the command. Fixes [#&#8203;83856](openclaw/openclaw#83856). ([#&#8203;83989](openclaw/openclaw#83989))
- Agents/messages: stop message-tool-only turns after a successful source-channel `message` send while keeping transcript mirrors under the session write lock. ([#&#8203;84289](openclaw/openclaw#84289))
- Agents: filter silent heartbeat response-tool transcript artifacts out of embedded context snapshots so later user turns are not polluted by heartbeat no-op messages. ([#&#8203;83477](openclaw/openclaw#83477)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Agents/OpenAI: log repeated strict tool-schema downgrade diagnostics once per provider/model/tool signature, reducing duplicate debug noise while preserving `strict=false` fallback behavior. Fixes [#&#8203;82930](openclaw/openclaw#82930). ([#&#8203;82933](openclaw/openclaw#82933)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/code mode: spell out the `exec` tool's JavaScript/TypeScript, no Node module, and catalog-bridge constraints in model-visible schema text so agents can use enabled tools without trial-and-error. ([#&#8203;84269](openclaw/openclaw#84269)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Codex: give `image_generate` dynamic-tool calls a 120s default watchdog when no per-call or configured image timeout is set, so image generation no longer falls back to the generic 30s bridge timeout. ([#&#8203;84254](openclaw/openclaw#84254)) Thanks [@&#8203;moritzmmayerhofer](https://github.com/moritzmmayerhofer).
- Codex: avoid duplicate dynamic tool terminal diagnostics while large diagnostic backlogs drain without blocking tool responses. ([#&#8203;82937](openclaw/openclaw#82937)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- CLI/message: include a stable top-level `messageId` in `openclaw message --json` output when channel sends return one. ([#&#8203;84191](openclaw/openclaw#84191)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- Cron: preserve legacy top-level array `jobs.json` stores when loading or adding scheduled jobs so old cron jobs are no longer treated as an empty store during upgrade. Fixes [#&#8203;60799](openclaw/openclaw#60799). ([#&#8203;84433](openclaw/openclaw#84433)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Gateway/agents: use an agent's `identity.name` in Gateway agent summaries when `agents.list[].name` is unset, so configured agent labels remain visible in clients. ([#&#8203;84355](openclaw/openclaw#84355); refs [#&#8203;57835](openclaw/openclaw#57835)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).
- Channels/replies: keep normal `/verbose` failed-tool progress compact in message-tool replies and prevent late text-only tool output from appearing after the final answer. ([#&#8203;84303](openclaw/openclaw#84303)) Thanks [@&#8203;VACInc](https://github.com/VACInc).
- Plugins/hooks: apply a default 30-second timeout to `before_compaction` and `after_compaction` hooks so a hung plugin handler no longer blocks compaction completion. ([#&#8203;84153](openclaw/openclaw#84153))
- Discord: preserve disabled presentation buttons when adapting and rendering Discord message controls. ([#&#8203;84188](openclaw/openclaw#84188)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- Twitch: add a test-only client-manager registry reset helper so non-isolated Twitch tests can clear cached managers between cases. Fixes [#&#8203;83887](openclaw/openclaw#83887). ([#&#8203;84244](openclaw/openclaw#84244)) Thanks [@&#8203;hclsys](https://github.com/hclsys).
- Cron: run main-session scheduled work on a cron-owned wake lane while preserving reply delivery context, so background cron turns no longer block human main-session chat. Fixes [#&#8203;82766](openclaw/openclaw#82766). ([#&#8203;82767](openclaw/openclaw#82767)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Cron: use structured embedded-run denial metadata for isolated scheduled tasks so blocked exec requests fail the job without treating ordinary assistant prose as a denial. ([#&#8203;84067](openclaw/openclaw#84067)) Thanks [@&#8203;abnershang](https://github.com/abnershang).
- Cron: keep recovered tool warnings diagnostic for successful scheduled runs so final cron output is delivered instead of being replaced by a post-processing warning. ([#&#8203;84045](openclaw/openclaw#84045)) Thanks [@&#8203;abnershang](https://github.com/abnershang).
- Plugins/perf: thread explicit plugin discovery results through `loadBundledCapabilityRuntimeRegistry`, `resolveBundledPluginSources`, and `listChannelCatalogEntries` so callers that already hold a discovery result skip redundant filesystem walks. Thanks [@&#8203;SebTardif](https://github.com/SebTardif).
- harden update restart script creation \[AI]. ([#&#8203;84088](openclaw/openclaw#84088)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Docker: keep the bundled Codex plugin in official release image keep lists so the default OpenAI agent harness remains available after Docker pruning. Fixes [#&#8203;83613](openclaw/openclaw#83613). ([#&#8203;83626](openclaw/openclaw#83626)) Thanks [@&#8203;YuanHanzhong](https://github.com/YuanHanzhong).
- CLI/channels: preserve the first line of `openclaw channels logs` output when the rolling tail window starts exactly on a line boundary, mirroring the already-fixed `readLogSlice` behavior in `src/logging/log-tail.ts`.
- Control UI: treat terminal session status as authoritative over stale active-run flags so completed terminal runs stop showing abort/live UI. ([#&#8203;84057](openclaw/openclaw#84057))
- CLI: preserve embedded equals signs in inline root option values instead of truncating after the second separator. ([#&#8203;83995](openclaw/openclaw#83995)) Thanks [@&#8203;ThiagoCAltoe](https://github.com/ThiagoCAltoe).
- Matrix/config: accept `messages.queue.byChannel.matrix` queue overrides and keep queue provider schema/type keys aligned for Matrix, Google Chat, and Mattermost. Thanks [@&#8203;bdjben](https://github.com/bdjben).
- CLI: format `openclaw acp client` failures through the shared error formatter so object-shaped errors stay readable instead of printing `[object Object]`. Fixes [#&#8203;83904](openclaw/openclaw#83904). ([#&#8203;84080](openclaw/openclaw#84080))
- Providers/Ollama: default unknown-capabilities models to tool-capable so discovered native Ollama models can use tools when `/api/show` omits capabilities. ([#&#8203;84055](openclaw/openclaw#84055)) Thanks [@&#8203;dutifulbob](https://github.com/dutifulbob).
- Installer/Windows: launch `install.ps1` onboarding as an attached child process so fresh native Windows installs do not freeze visibly at `Starting setup...` or corrupt the wizard's terminal rendering.
- CLI/update: keep restart health checks working across one-version CLI/Gateway protocol skew and use the managed Gateway service Node for all follow-up commands even when the package root is unchanged, so `openclaw update` no longer silently switches the gateway to a different Node binary when multiple Node installations are present. Thanks [@&#8203;amknight](https://github.com/amknight).
- CLI/gateway: include the running Gateway version in `gateway status` JSON output, preserving existing server metadata while falling back to status RPC data for read probes. Fixes [#&#8203;56222](openclaw/openclaw#56222). Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Memory/search: close local embedding providers when active-memory searches time out so pending local model loads and embedding contexts are aborted and released. ([#&#8203;83858](openclaw/openclaw#83858)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).
- CLI/nodes: request pending node surface approval scopes before `openclaw nodes approve` so exec-capable node approval can use admin-scoped Gateway credentials instead of failing with `missing scope: operator.admin`. ([#&#8203;84392](openclaw/openclaw#84392)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Gateway: reject slow node event sends before outbound buffers grow unbounded and log the rejected payload diagnostic. ([#&#8203;84387](openclaw/openclaw#84387)) Thanks [@&#8203;samzong](https://github.com/samzong).
- Agents: include bounded trajectory queued-writer diagnostics in `pi-trajectory-flush` timeout warnings so flush stalls show pending writes, queued bytes, and append state. Fixes [#&#8203;82961](openclaw/openclaw#82961). ([#&#8203;82962](openclaw/openclaw#82962)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/subagents: recover stale completion announces by retrying unsupported transcript-wait wakes without transcript waiting and forcing a message-tool handoff when the requester run is already stale. Fixes [#&#8203;83699](openclaw/openclaw#83699). ([#&#8203;83700](openclaw/openclaw#83700)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/subagents: constrain wildcard subagent target allowlists to configured agents while preserving explicitly listed compatibility targets. Fixes [#&#8203;84040](openclaw/openclaw#84040). ([#&#8203;84357](openclaw/openclaw#84357)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Providers/Anthropic: route Anthropic model refs selected with Claude CLI auth through the Claude CLI runtime so shorthand refs such as `anthropic/opus-4.7` no longer fall back to embedded Anthropic billing. Fixes [#&#8203;84222](openclaw/openclaw#84222). ([#&#8203;84374](openclaw/openclaw#84374)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Agents: honor explicit `models.providers.<id>.timeoutSeconds` values above the default idle watchdog for cloud and self-hosted providers, so long first-token waits no longer fall back at \~120s when the provider timeout is higher. ([#&#8203;83979](openclaw/openclaw#83979)) Thanks [@&#8203;yujiawei](https://github.com/yujiawei).
- Agents/Codex: keep encrypted Responses reasoning replay provenance-bound so stale mirrored Codex transcripts drop invalid encrypted content before request assembly while preserving matching same-session replay. Fixes [#&#8203;83836](openclaw/openclaw#83836). ([#&#8203;84367](openclaw/openclaw#84367)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Agents/subagents: skip stale embedded-run wake probes for dormant completion requesters, so late subagent completions go straight to requester-agent/direct handoff instead of producing `reason=no_active_run` queue noise. ([#&#8203;82964](openclaw/openclaw#82964)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- CLI: retry config snapshot reads after a transient failure so one rejected read no longer poisons later commands in the same process. ([#&#8203;83931](openclaw/openclaw#83931)) Thanks [@&#8203;honor2030](https://github.com/honor2030).
- Media: decode URL path basenames before using them as remote media fallback filenames, so files like `My%20Report.pdf` are surfaced as `My Report.pdf`. Fixes [#&#8203;84050](openclaw/openclaw#84050). ([#&#8203;84052](openclaw/openclaw#84052)) Thanks [@&#8203;jbetala7](https://github.com/jbetala7).
- WhatsApp: clarify inbound group diagnostics so observed but unregistered groups point to `channels.whatsapp.groups` without changing routing or sender authorization. ([#&#8203;83846](openclaw/openclaw#83846)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- WhatsApp: drain pending outbound deliveries on a 30s periodic timer in addition to the reconnect handler, so messages enqueued while the provider is already connected no longer wait for the next reconnect to send. ([#&#8203;79083](openclaw/openclaw#79083)) Thanks [@&#8203;Oviemudiaga](https://github.com/Oviemudiaga).
- CLI/TUI: include gateway plugin slash commands in TUI autocomplete, so connected sessions can suggest plugin-owned commands exposed by the running Gateway. ([#&#8203;83640](openclaw/openclaw#83640)) Thanks [@&#8203;se7en-agent](https://github.com/se7en-agent).
- Gateway/mobile: restore QR setup-code handoff of bounded operator tokens for iOS and Android onboarding while keeping admin and pairing scopes out of bootstrap. ([#&#8203;83684](openclaw/openclaw#83684)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- iOS: repair Release archive compilation for the TestFlight build. ([#&#8203;84255](openclaw/openclaw#84255)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- Agents/compaction: bound plugin-owned CLI transcript compaction with the host safety timeout so a hung context engine can no longer stall post-turn cleanup. ([#&#8203;84083](openclaw/openclaw#84083)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Control UI/usage: truncate long context skill, tool, and file names in the usage panel while keeping the full name available on hover. ([#&#8203;42197](openclaw/openclaw#42197)) Thanks [@&#8203;Rain120](https://github.com/Rain120).
- Codex: respect explicit `models auth order set` and `config.auth.order` precedence over stale `lastGood` in `/codex account`, and show `no working credential` when every explicit-order profile is ineligible instead of marking a lower-ranked profile as active. Fixes [#&#8203;84386](openclaw/openclaw#84386). ([#&#8203;84412](openclaw/openclaw#84412)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Agents: honor `messages.suppressToolErrors` for mutating tool failures so configured chat surfaces do not receive separate warning payloads. ([#&#8203;81561](openclaw/openclaw#81561)) Thanks [@&#8203;moeedahmed](https://github.com/moeedahmed).
- Agents/fallback: surface billing guidance for mixed rate-limit plus billing fallback exhaustion instead of generic failure copy. Fixes [#&#8203;79396](openclaw/openclaw#79396). ([#&#8203;79489](openclaw/openclaw#79489)) Thanks [@&#8203;aayushprsingh](https://github.com/aayushprsingh).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/615
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: XS status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants