fix(code-mode): sharpen exec tool description so models stop wasting turns rediscovering constraints by Kaspre · Pull Request #84269 · openclaw/openclaw

Kaspre · 2026-05-19T19:13:18Z

Problem

When OpenClaw code mode is enabled, the exec tool's description tells the model to use tools.search / describe / call to call enabled tools but does not state any of the runtime's actual constraints. In observed agent traces, the model burns turns rediscovering them:

It tries language: "bash" because "code mode" reads as "any code"; gets rejected.
It writes require('fs') (or another Node built-in) to do I/O; gets "code mode module access is disabled." and loops trying simpler constructs (let x = 1, bare expression) to figure out which patterns are allowed.
It rediscovers, turn by turn, that Node modules are unavailable and that the catalog bridge is the only path to I/O.

Change and value

Sharpen the description so the model sees the constraints up front:

Node modules and require / import are NOT available.
The language field accepts only "javascript" or "typescript".
The catalog bridge is the path for shell, file, network, or external action: tools.search(query) finds enabled catalog entries, then code should pass entry.id to tools.describe(entry.id) and tools.call(entry.id, args).
Catalog calls still go only through enabled tools allowed by existing policy.

Parameter descriptions also state which symbols are in scope (tools, ALL_TOOLS) versus what is not (Node built-ins).

No runtime behavior changes — only the tool-card text the model sees.

Who's affected

All agents with code mode enabled (globally or per-agent), regardless of provider. Most directly: models that don't already know the catalog-bridge contract from training-set examples — typically smaller open-weight models routed through the Pi-embedded-runner.

Why now

Per-agent code mode shipped in 2026.5.18 (#83473). A controlled re-enable for evaluation surfaced sessions where the agent spent 6+ turns probing the same constraints before stabilizing on the bridge pattern. Description-only fixes are the cheapest path to reduce that floor; the rest of the friction is structural and will need separate work.

Implementation

src/agents/code-mode.ts — createCodeModeTools:

exec tool description expanded with the three constraints.
Bridge guidance uses the documented id-based flow: search for entries, then call describe / call with entry.id.
code parameter description states scope explicitly (tools and ALL_TOOLS in scope; Node built-ins not).
language parameter description states the enum.

src/agents/code-mode.test.ts:

Adds regression coverage that the model-visible exec description and parameter descriptions keep those runtime constraints visible.

Real behavior proof

Behavior or issue addressed: When code mode is enabled, models routed through the Pi-embedded-runner can waste turns rediscovering that language: "bash" is rejected, that require / import is unavailable, and that the tools.search / describe / call bridge is the path to I/O. The current exec description does not state any of those constraints; agents must discover them by trial and error at runtime.
Real environment tested: OpenClaw 2026.5.18 stable (commit 50a2481); agent routed to ollama-cloud/kimi-k2.6 via the Pi-embedded-runner; tools.codeMode.enabled: true (per-agent override per fix(code-mode): honor agent scoped code mode #83473 schema). Same agent, same model, same prompt across before/after for the live delivery proof. Final-head schema readback was run from this PR branch at e75ca8e7bb34dfc76a6e6dc23397e571353a09e2. CI-shaped validation was run on the PR merge ref in Crabbox/GCP.

Exact steps or command run after this patch:

# Live delivery proof — same prompt before and after patching the running dist:
PROBE='Without calling any tools, recite the description text of the `exec` tool exactly as it appears in your tool definitions. Quote it verbatim and reply with just that text.'

openclaw agent --local --agent <pi-runtime-codemode-agent> --timeout 120 \
  --model ollama-cloud/kimi-k2.6 --thinking low \
  --session-id "cm-rbp-<phase>-$(date +%s)" \
  -m "$PROBE" --json

# Final-head schema readback at e75ca8e7bb:
node --import tsx -e 'import { createCodeModeTools } from "./src/agents/code-mode.ts"; import { createToolSearchCatalogRef } from "./src/agents/tool-search.ts"; const config = { tools: { codeMode: true } }; const [execTool] = createCodeModeTools({ config, runtimeConfig: config, sessionId: "schema-proof", sessionKey: "agent:main:main", runId: "schema-proof", catalogRef: createToolSearchCatalogRef() }); console.log(execTool.description);'

# Remote merge-ref validation in Crabbox/GCP:
pnpm install --frozen-lockfile --reporter=append-only
pnpm build
pnpm tsgo:core:test
pnpm check:changed

Evidence after fix: Live agent dispatches (recitation probe, fresh sessions) at 2026-05-19T20:28Z (before-patch dist) and 2026-05-19T20:42Z (after-patch dist) returned the following final payloads.

Before (main dist; agent reads OLD description):
```
Run JavaScript or TypeScript in OpenClaw code mode. Use ALL_TOOLS and tools.search/describe/call inside the code to discover and call enabled tools.
```
No statement that require / import is blocked. No statement that language rejects values other than javascript / typescript. The model has to discover these by trying them.

Live delivery after initial patch (dist patched to this PR's description block; agent reads NEW description):
```
Run JavaScript or TypeScript in OpenClaw code mode. Node.js modules and `require`/`import` are NOT available — for any shell, file, network, or external action, use the catalog bridge from inside your code: `tools.search(query)` to find tools, `tools.describe(name)` for the input schema, then `tools.call(name, args)`. The `language` field accepts only "javascript" or "typescript"; do not pass "bash", "shell", or other values.
```
The live recitation proved the model receives the model-visible description unchanged.

Final head schema readback (e75ca8e7bb; source code path that builds the model-visible tool schema):
```
Run JavaScript or TypeScript in OpenClaw code mode. Node.js modules and `require`/`import` are NOT available — for any shell, file, network, or external action, use enabled catalog tools allowed by policy from inside your code: `tools.search(query)` to find catalog entries, `tools.describe(entry.id)` for the input schema, then `tools.call(entry.id, args)`. The `language` field accepts only "javascript" or "typescript"; do not pass "bash", "shell", or other values.
```
Final-head source assertions also checked the code parameter mentions tools, ALL_TOOLS, and unavailable Node built-ins, and that language says it must be "javascript" or "typescript".

Remote merge-ref validation (Crabbox/GCP, us-east1-b, PR merge ref containing e75ca8e7bb): pnpm install --frozen-lockfile --reporter=append-only, pnpm build, pnpm tsgo:core:test, and pnpm check:changed all exited 0. pnpm check:changed selected the core and coreTests lanes, typechecked both, linted 8652 files with 217 rules using 1 thread, and reported 0 warnings / 0 errors. Cleanup check found no remaining crabbox-* GCP instances.
Observed result after fix: The live recitation proves the generated exec description reaches the model as its tool-card text. The final PR head now emits the safer id-based bridge wording from the same createCodeModeTools schema path, so agents see the constraints up front without being steered toward ambiguous name-based describe / call handles.
What was not tested: A multi-trial controlled A/B measuring the rate-of-rediscovery reduction (turns spent attempting language: "bash", require(), etc., before stabilizing on the bridge pattern). The proof confirms delivery and final schema text; behavioral impact would need many trials on a model that exhibits the rediscovery loop reliably (an earlier observed instance was a real session that looped 6+ turns on code mode module access is disabled after attempting require('fs')).

…turns rediscovering constraints When tools.codeMode.enabled is true, the exec tool's description tells the model to use tools.search/describe/call to call enabled tools, but does not state any of the runtime's actual constraints. Observed traces show models waste turns rediscovering them: trying language: "bash", writing require() for I/O, and probing scope by trial and error. This change tells the model up front: - Node modules and require/import are NOT available - language accepts only "javascript" or "typescript" - the tools.search → describe → call bridge is the path to any shell, file, network, or external action - which symbols are in scope (tools, ALL_TOOLS) versus what is not No runtime behavior changes — only the tool-card description strings. Snapshot tests do not reference either old string (verified via grep).

clawsweeper · 2026-05-19T19:14:35Z

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR expands the model-visible code-mode exec, code, and language schema descriptions and adds a regression test asserting those runtime constraints stay visible.

Reproducibility: yes. for the source-level tool-card gap: current main's createCodeModeTools description lacks constraints that readCode, prepareSource, the worker bridge, and docs already define. No high-confidence reproduction was established for the broader statistical turn-saving claim.

PR rating
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Summary: Good normal PR quality: small aligned patch, sufficient live-output proof, focused regression coverage, and no blocking findings.

Rank-up moves:

none

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
✨ Hatched: 🌱 uncommon Frosted Patch Peep

       /\  .---.  /\         
      /  \/     \/  \        
     /   ( -   - )   \       
    |       ._.       |      
    |   /|  ===  |\   |      
     \  \|______/|/  /       
      '._  `--'  _.'         
         '-.__.-'            
       _/|_|  |_|\_          
      /__|      |__\         
       .-----------.         
      '-------------'

Rarity: 🌱 uncommon.
Trait: sparkles near resolved comments.
Image traits: location proof lagoon; accessory green check lantern; palette charcoal, cyan, and signal green; mood calm; pose standing beside its cracked shell; shell translucent glimmer shell; lighting bright celebratory glints; background tiny artifact crates.
How to hatch it: once this PR reaches status: 👀 ready for maintainer look or status: 🚀 automerge armed, the PR author or a maintainer can comment @clawsweeper hatch to turn this ASCII egg into its generated creature image.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Frosted Patch Peep in ClawSweeper.

What is this egg doing here?

Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
Hatchable usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Real behavior proof
Sufficient (live_output): The PR body provides before/after live agent recitation output plus final-head schema readback proving the changed description reaches the model-visible tool schema.

Risk before merge
Why this matters: - The proof demonstrates delivery of the new tool-card text, but not a controlled multi-trial reduction in wasted model turns; the efficiency impact remains inferential.

Maintainer options:

Decide the mitigation before merge
Land the narrow schema-copy and regression-test change if maintainers accept the wording, leaving quantified model-behavior measurement to follow-up work.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
No repair lane is needed; there are no blocking review findings and the remaining action is ordinary maintainer review and merge workflow.

Security
Cleared: The diff only changes model-facing description strings and a colocated test; it adds no dependencies, workflow changes, secret handling, or new code execution path.

Review details

Best possible solution:

Land the narrow schema-copy and regression-test change if maintainers accept the wording, leaving quantified model-behavior measurement to follow-up work.

Do we have a high-confidence way to reproduce the issue?

Yes for the source-level tool-card gap: current main's createCodeModeTools description lacks constraints that readCode, prepareSource, the worker bridge, and docs already define. No high-confidence reproduction was established for the broader statistical turn-saving claim.

Is this the best way to solve the issue?

Yes; changing the model-visible schema text and adding a focused regression test is the narrowest maintainable solution because the runtime enforcement and documentation already exist.

Label justifications:

P3: This is a low-risk code-mode ergonomics and test improvement with no runtime behavior or compatibility change.

What I checked:

Current main schema gap: Current main's exec tool description only tells models to use ALL_TOOLS and tools.search/describe/call; it does not mention unavailable Node modules, rejected shell languages, or id-based bridge use. (src/agents/code-mode.ts:829, a059309a9f9a)
Runtime language and module constraints: readCode rejects non-javascript/typescript language values and prepareSource rejects import/require patterns with code mode module access is disabled. (src/agents/code-mode.ts:299, a059309a9f9a)
Bridge contract matches the proposed wording: The QuickJS worker exposes tools.search, tools.describe, tools.call, and ALL_TOOLS, while docs describe the id-based tools.describe(files[0].id) and tools.call(fileRead.id, ...) flow. (src/agents/code-mode.worker.ts:160, a059309a9f9a)
PR head implementation: The PR head updates the schema description to state unavailable Node modules, the allowed language enum, and the tools.search to tools.describe(entry.id) to tools.call(entry.id, args) bridge path. (src/agents/code-mode.ts:829, e75ca8e7bb34)
PR head test coverage: The added test asserts the model-visible exec and parameter descriptions continue to mention Node-module unavailability, tools, ALL_TOOLS, id-based describe/call, and the language enum. (src/agents/code-mode.test.ts:248, e75ca8e7bb34)
Real behavior proof in PR body: The PR body includes before/after live agent recitation output showing the patched tool description reaches the model, final-head schema readback at e75ca8e7bb34dfc76a6e6dc23397e571353a09e2, and remote Crabbox/GCP validation for install, build, tsgo:core:test, and check:changed. (e75ca8e7bb34)

Likely related people:

Galin Iliev: git blame on the current createCodeModeTools schema and nearby tests points to the commit that added the current code-mode files in this checkout. (role: introduced/current implementation history; confidence: medium; commits: 57ec361682e2; files: src/agents/code-mode.ts, src/agents/code-mode.test.ts)
Kaspre: The provided GitHub context shows Kaspre authored the recently merged per-agent code-mode PR that this wording change builds on, so they have recent domain context beyond this proposal. (role: recent code-mode contributor; confidence: medium; commits: fd8877b5fde3; files: src/agents/code-mode.ts, src/config/schema.help.ts, docs/reference/code-mode.md)
Peter Steinberger: git shortlog over the sampled agent/code-mode/embedded-runner paths shows Peter as the largest adjacent contributor, and the latest stable release commit is also under this area history. (role: adjacent area contributor; confidence: medium; commits: 50a2481652b6; files: src/agents/code-mode.ts, src/agents/tool-search.ts, src/agents/pi-embedded-runner/run/attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a059309a9f9a.

Takhoffman · 2026-05-19T23:51:40Z

@clawsweeper automerge

clawsweeper · 2026-05-19T23:53:54Z

🦞🔧
ClawSweeper automerge is enabled.

Head: e75ca8e7bb34
Label: clawsweeper:automerge
Action: repair worker queued. Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875
Flow: review this head, repair/rebase only if needed, then re-review the exact repaired head before merge.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

2026-05-19 23:51:40 UTC review queued e75ca8e7bb34 (queued)

2026-05-19 23:53:53 UTC repair queued e75ca8e7bb34 (autonomous) Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875

2026-05-19 23:58:31 UTC repair started (running) in 1s Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875 automerge-openclaw-openclaw-84269

2026-05-19 23:58:47 UTC validation plan (passed) in 17s Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875 pnpm check:changed; pnpm lint; pnpm check:test-types

2026-05-19 23:58:59 UTC Codex write preflight (passed) in 28s Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875 danger-full-access

2026-05-20 00:04:57 UTC Codex edit 1 dce1545a5e2f (complete) in 6m 27s Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875 exit 0

2026-05-20 00:07:55 UTC validation and review 1 ae25826f0f54 (complete) in 9m 25s Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875 already-current

2026-05-20 00:08:37 UTC repair finished ae25826f0f54 (opened) in 10m 6s Run: https://github.com/openclaw/clawsweeper/actions/runs/26132498875 open_fix_pr

Takhoffman · 2026-05-20T00:02:25Z

@clawsweeper hatch

clawsweeper · 2026-05-20T00:04:16Z

🦞👀
ClawSweeper PR egg hatch requested.

I queued a comment sync for this PR. If the egg is hatchable, ClawSweeper will generate the image once and update the existing review comment.
Action: PR egg hatch queued (workflow sweep.yml, event repository_dispatch).
The ASCII egg stays as the fallback.

clawsweeper · 2026-05-20T00:07:06Z

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Frosted Patch Peep

       _..------.._          
    .-'  .-.  .-.  '-.       
   /    ( * )( * )    \      
  |        .--.        |     
  |   <\   ====   />   |     
   \    '.______.'    /      
    '-._   ____   _.-'       
        `-.____.-'           
       __/|_||_|\__          
      /__.'    '.__\         
       .-----------.         
      '-------------'

Rarity: 🌱 uncommon.
Trait: sparkles near resolved comments.
Image traits: location proof lagoon; accessory green check lantern; palette charcoal, cyan, and signal green; mood calm; pose standing beside its cracked shell; shell translucent glimmer shell; lighting bright celebratory glints; background tiny artifact crates.
How to hatch it: once this PR reaches status: 👀 ready for maintainer look or status: 🚀 automerge armed, the PR author or a maintainer can comment @clawsweeper hatch to turn this ASCII egg into its generated creature image.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Frosted Patch Peep in ClawSweeper.

What is this egg doing here?

Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
Hatchable usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

clawsweeper · 2026-05-20T00:08:34Z

ClawSweeper 🐠 reef update

Thanks for the work here. ClawSweeper could not write to the source branch, so it opened a replacement PR rather than letting the fix drift. attribution still points back here.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #84368
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
This closeout is intentional for this run: the replacement PR is now the active review lane.
The replacement PR carries the original credit trail forward.
Co-author credit kept:

@Kaspre: Co-authored-by: Kaspre 36520309+Kaspre@users.noreply.github.com

fish notes: model gpt-5.5, reasoning high; reviewed against ae25826.

…026.5.20) (#615) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.19` → `2026.5.20` | --- > ⚠️ **Warning** > > Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information. --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.5.20`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026520) [Compare Source](openclaw/openclaw@v2026.5.19...v2026.5.20) ##### Changes - Exec approvals: remove the old `cat SKILL.md && printf ... && <skill-wrapper>` allowlist compatibility path so skill files must be loaded with the read tool and only the real skill executable is auto-allowed. - Discord: let voice sessions follow configured Discord users into voice channels, with allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation. ([#84264](openclaw/openclaw#84264)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Discord/voice: include bounded `IDENTITY.md`, `USER.md`, and `SOUL.md` profile context in realtime voice session instructions by default, with `voice.realtime.bootstrapContextFiles: []` available to disable it. ([#84499](openclaw/openclaw#84499)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Dependencies: bump the bundled Codex harness to `@openai/codex` `0.132.0` and refresh the app-server model-list docs for the new catalog. - CLI/policy: add the bundled Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. ([#80407](openclaw/openclaw#80407)) Thanks [@giodl73-repo](https://github.com/giodl73-repo). - Agents/config: allow `agents.list[].experimental.localModelLean` so lean local-model mode can be enabled for one configured agent instead of globally. - Providers/xAI: add device-code OAuth login so remote and headless setups can authorize xAI without a localhost browser callback. ([#84005](openclaw/openclaw#84005)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Providers/OpenRouter: honor provider-level `params.provider` routing policy for OpenRouter requests, with model and agent params overriding the defaults. Thanks [@amknight](https://github.com/amknight). ##### Fixes - CLI/tasks: include stale-running task maintenance decisions in `openclaw tasks maintenance --json` so retained and reconcile candidates explain backing-session, cron, CLI, and wedged-subagent state. ([#84691](openclaw/openclaw#84691)) Thanks [@efpiva](https://github.com/efpiva). - Codex app-server: keep system-prompt reports working when bootstrap hooks provide workspace files with only a path and content, so hook-supplied SOUL/IDENTITY/TOOLS/USER context still reports injected characters correctly. ([#84736](openclaw/openclaw#84736)) Thanks [@JARVIS-Glasses](https://github.com/JARVIS-Glasses). - Providers/MiniMax music: stop advertising `durationSeconds` control and remove prompt-injected duration hints, so `music_generate` reports MiniMax duration as an unsupported override instead of suggesting MiniMax can enforce track length. Fixes [#84508](openclaw/openclaw#84508). Thanks [@neeravmakwana](https://github.com/neeravmakwana). - Doctor: warn when sandbox tool policy hides configured MCP server tools before provider requests. ([#84699](openclaw/openclaw#84699)) Thanks [@nxmxbbd](https://github.com/nxmxbbd). - WhatsApp: update Baileys to `7.0.0-rc12`. - Build: suppress per-locale `rolldown-plugin-dts:fake-js` CommonJS dts warnings emitted while bundling the intentionally-inlined `zod/v4/locales/*.d.cts` files, so `pnpm build` output stays readable after the 0.25.1 plugin bump. Thanks [@romneyda](https://github.com/romneyda). - CLI/nodes: route lazy plugin-registration logs to stderr for JSON-mode `openclaw nodes` commands so stdout stays parseable. ([#84684](openclaw/openclaw#84684)) Thanks [@TurboTheTurtle](https://github.com/TurboTheTurtle). - Approvals: route manual `/approve` decisions through the trusted approval runtime so active exec and plugin approvals no longer look unknown or expired. - Mac app: update the About settings copyright year to 2026. ([#84385](openclaw/openclaw#84385)) Thanks [@pejmanjohn](https://github.com/pejmanjohn). - Dependencies: update `@openclaw/fs-safe` to `0.2.7` so OpenClaw's default Python-helper-off policy keeps best-effort Node write fallbacks for private stores, secret writes, run logs, and media attachments on Linux/macOS. - Infra/secrets: restore the fail-closed contract for `tryReadSecretFileSync` so credential loaders that pass `rejectSymlink: true` (Telegram, LINE, Zalo, IRC, Nextcloud Talk tokens) refuse symlinked credential files instead of silently accepting them, and the infra-state CI shard's secret-file symlink test passes again. Thanks [@romneyda](https://github.com/romneyda). - Browser: honor the configured image sanitization limit for screenshots and labeled snapshots so browser-captured images follow the same resize policy as other image results. ([#84595](openclaw/openclaw#84595)) - Doctor: remove unrecognized `models.providers.*.models[*].compat.thinkingFormat` values during `doctor --fix` so stale provider model config can validate after upgrade. Fixes [#77803](openclaw/openclaw#77803). - Doctor: warn when `openclaw.json` stores plaintext secret-bearing config fields, including model provider API keys and sensitive provider headers. ([#84718](openclaw/openclaw#84718)) Thanks [@lukaIvanic](https://github.com/lukaIvanic). - Status: show the configured default, session-selected model, reason, clear hint, and docs link when a session remains pinned to a model that differs from `agents.defaults.model.primary`. - WebChat: clear stale typing indicators when session change events mark the active chat run complete. - Mac app: keep local packaging signed with a stable app identity for permission testing and fix Control UI production builds under current Vite/Highlight.js exports. - macOS app: update the embedded Peekaboo bridge to 3.2.1 so OpenClaw-hosted UI automation works with current Peekaboo CLI capture flows. - Cron: deliver preferred final assistant output for successful scheduled runs when trailing plain tool warnings remain in diagnostics instead of marking the run failed. - fix(mattermost): fail closed on missing channel type \[AI]. ([#84091](openclaw/openclaw#84091)) Thanks [@pgondhi987](https://github.com/pgondhi987). - Recheck rebuilt system.run argv \[AI]. ([#84090](openclaw/openclaw#84090)) Thanks [@pgondhi987](https://github.com/pgondhi987). - CLI: keep the private QA subcommand out of exported command descriptors unless `OPENCLAW_ENABLE_PRIVATE_QA_CLI=1`, so root help and subcommand markers match runtime registration. ([#84519](openclaw/openclaw#84519)) - CLI/cron: bound `openclaw cron show` job lookup pagination so non-advancing or unbounded `cron.list` responses fail instead of hanging the command. Fixes [#83856](openclaw/openclaw#83856). ([#83989](openclaw/openclaw#83989)) - Agents/messages: stop message-tool-only turns after a successful source-channel `message` send while keeping transcript mirrors under the session write lock. ([#84289](openclaw/openclaw#84289)) - Agents: filter silent heartbeat response-tool transcript artifacts out of embedded context snapshots so later user turns are not polluted by heartbeat no-op messages. ([#83477](openclaw/openclaw#83477)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Agents/OpenAI: log repeated strict tool-schema downgrade diagnostics once per provider/model/tool signature, reducing duplicate debug noise while preserving `strict=false` fallback behavior. Fixes [#82930](openclaw/openclaw#82930). ([#82933](openclaw/openclaw#82933)) Thanks [@galiniliev](https://github.com/galiniliev). - Agents/code mode: spell out the `exec` tool's JavaScript/TypeScript, no Node module, and catalog-bridge constraints in model-visible schema text so agents can use enabled tools without trial-and-error. ([#84269](openclaw/openclaw#84269)) Thanks [@Kaspre](https://github.com/Kaspre). - Codex: give `image_generate` dynamic-tool calls a 120s default watchdog when no per-call or configured image timeout is set, so image generation no longer falls back to the generic 30s bridge timeout. ([#84254](openclaw/openclaw#84254)) Thanks [@moritzmmayerhofer](https://github.com/moritzmmayerhofer). - Codex: avoid duplicate dynamic tool terminal diagnostics while large diagnostic backlogs drain without blocking tool responses. ([#82937](openclaw/openclaw#82937)) Thanks [@galiniliev](https://github.com/galiniliev). - CLI/message: include a stable top-level `messageId` in `openclaw message --json` output when channel sends return one. ([#84191](openclaw/openclaw#84191)) Thanks [@100menotu001](https://github.com/100menotu001). - Cron: preserve legacy top-level array `jobs.json` stores when loading or adding scheduled jobs so old cron jobs are no longer treated as an empty store during upgrade. Fixes [#60799](openclaw/openclaw#60799). ([#84433](openclaw/openclaw#84433)) Thanks [@IWhatsskill](https://github.com/IWhatsskill). - Gateway/agents: use an agent's `identity.name` in Gateway agent summaries when `agents.list[].name` is unset, so configured agent labels remain visible in clients. ([#84355](openclaw/openclaw#84355); refs [#57835](openclaw/openclaw#57835)) Thanks [@luoyanglang](https://github.com/luoyanglang). - Channels/replies: keep normal `/verbose` failed-tool progress compact in message-tool replies and prevent late text-only tool output from appearing after the final answer. ([#84303](openclaw/openclaw#84303)) Thanks [@VACInc](https://github.com/VACInc). - Plugins/hooks: apply a default 30-second timeout to `before_compaction` and `after_compaction` hooks so a hung plugin handler no longer blocks compaction completion. ([#84153](openclaw/openclaw#84153)) - Discord: preserve disabled presentation buttons when adapting and rendering Discord message controls. ([#84188](openclaw/openclaw#84188)) Thanks [@100menotu001](https://github.com/100menotu001). - Twitch: add a test-only client-manager registry reset helper so non-isolated Twitch tests can clear cached managers between cases. Fixes [#83887](openclaw/openclaw#83887). ([#84244](openclaw/openclaw#84244)) Thanks [@hclsys](https://github.com/hclsys). - Cron: run main-session scheduled work on a cron-owned wake lane while preserving reply delivery context, so background cron turns no longer block human main-session chat. Fixes [#82766](openclaw/openclaw#82766). ([#82767](openclaw/openclaw#82767)) Thanks [@galiniliev](https://github.com/galiniliev). - Cron: use structured embedded-run denial metadata for isolated scheduled tasks so blocked exec requests fail the job without treating ordinary assistant prose as a denial. ([#84067](openclaw/openclaw#84067)) Thanks [@abnershang](https://github.com/abnershang). - Cron: keep recovered tool warnings diagnostic for successful scheduled runs so final cron output is delivered instead of being replaced by a post-processing warning. ([#84045](openclaw/openclaw#84045)) Thanks [@abnershang](https://github.com/abnershang). - Plugins/perf: thread explicit plugin discovery results through `loadBundledCapabilityRuntimeRegistry`, `resolveBundledPluginSources`, and `listChannelCatalogEntries` so callers that already hold a discovery result skip redundant filesystem walks. Thanks [@SebTardif](https://github.com/SebTardif). - harden update restart script creation \[AI]. ([#84088](openclaw/openclaw#84088)) Thanks [@pgondhi987](https://github.com/pgondhi987). - Docker: keep the bundled Codex plugin in official release image keep lists so the default OpenAI agent harness remains available after Docker pruning. Fixes [#83613](openclaw/openclaw#83613). ([#83626](openclaw/openclaw#83626)) Thanks [@YuanHanzhong](https://github.com/YuanHanzhong). - CLI/channels: preserve the first line of `openclaw channels logs` output when the rolling tail window starts exactly on a line boundary, mirroring the already-fixed `readLogSlice` behavior in `src/logging/log-tail.ts`. - Control UI: treat terminal session status as authoritative over stale active-run flags so completed terminal runs stop showing abort/live UI. ([#84057](openclaw/openclaw#84057)) - CLI: preserve embedded equals signs in inline root option values instead of truncating after the second separator. ([#83995](openclaw/openclaw#83995)) Thanks [@ThiagoCAltoe](https://github.com/ThiagoCAltoe). - Matrix/config: accept `messages.queue.byChannel.matrix` queue overrides and keep queue provider schema/type keys aligned for Matrix, Google Chat, and Mattermost. Thanks [@bdjben](https://github.com/bdjben). - CLI: format `openclaw acp client` failures through the shared error formatter so object-shaped errors stay readable instead of printing `[object Object]`. Fixes [#83904](openclaw/openclaw#83904). ([#84080](openclaw/openclaw#84080)) - Providers/Ollama: default unknown-capabilities models to tool-capable so discovered native Ollama models can use tools when `/api/show` omits capabilities. ([#84055](openclaw/openclaw#84055)) Thanks [@dutifulbob](https://github.com/dutifulbob). - Installer/Windows: launch `install.ps1` onboarding as an attached child process so fresh native Windows installs do not freeze visibly at `Starting setup...` or corrupt the wizard's terminal rendering. - CLI/update: keep restart health checks working across one-version CLI/Gateway protocol skew and use the managed Gateway service Node for all follow-up commands even when the package root is unchanged, so `openclaw update` no longer silently switches the gateway to a different Node binary when multiple Node installations are present. Thanks [@amknight](https://github.com/amknight). - CLI/gateway: include the running Gateway version in `gateway status` JSON output, preserving existing server metadata while falling back to status RPC data for read probes. Fixes [#56222](openclaw/openclaw#56222). Thanks [@galiniliev](https://github.com/galiniliev). - Memory/search: close local embedding providers when active-memory searches time out so pending local model loads and embedding contexts are aborted and released. ([#83858](openclaw/openclaw#83858)) Thanks [@brokemac79](https://github.com/brokemac79). - CLI/nodes: request pending node surface approval scopes before `openclaw nodes approve` so exec-capable node approval can use admin-scoped Gateway credentials instead of failing with `missing scope: operator.admin`. ([#84392](openclaw/openclaw#84392)) Thanks [@joshavant](https://github.com/joshavant). - Gateway: reject slow node event sends before outbound buffers grow unbounded and log the rejected payload diagnostic. ([#84387](openclaw/openclaw#84387)) Thanks [@samzong](https://github.com/samzong). - Agents: include bounded trajectory queued-writer diagnostics in `pi-trajectory-flush` timeout warnings so flush stalls show pending writes, queued bytes, and append state. Fixes [#82961](openclaw/openclaw#82961). ([#82962](openclaw/openclaw#82962)) Thanks [@galiniliev](https://github.com/galiniliev). - Agents/subagents: recover stale completion announces by retrying unsupported transcript-wait wakes without transcript waiting and forcing a message-tool handoff when the requester run is already stale. Fixes [#83699](openclaw/openclaw#83699). ([#83700](openclaw/openclaw#83700)) Thanks [@galiniliev](https://github.com/galiniliev). - Agents/subagents: constrain wildcard subagent target allowlists to configured agents while preserving explicitly listed compatibility targets. Fixes [#84040](openclaw/openclaw#84040). ([#84357](openclaw/openclaw#84357)) Thanks [@joshavant](https://github.com/joshavant). - Providers/Anthropic: route Anthropic model refs selected with Claude CLI auth through the Claude CLI runtime so shorthand refs such as `anthropic/opus-4.7` no longer fall back to embedded Anthropic billing. Fixes [#84222](openclaw/openclaw#84222). ([#84374](openclaw/openclaw#84374)) Thanks [@joshavant](https://github.com/joshavant). - Agents: honor explicit `models.providers.<id>.timeoutSeconds` values above the default idle watchdog for cloud and self-hosted providers, so long first-token waits no longer fall back at \~120s when the provider timeout is higher. ([#83979](openclaw/openclaw#83979)) Thanks [@yujiawei](https://github.com/yujiawei). - Agents/Codex: keep encrypted Responses reasoning replay provenance-bound so stale mirrored Codex transcripts drop invalid encrypted content before request assembly while preserving matching same-session replay. Fixes [#83836](openclaw/openclaw#83836). ([#84367](openclaw/openclaw#84367)) Thanks [@joshavant](https://github.com/joshavant). - Agents/subagents: skip stale embedded-run wake probes for dormant completion requesters, so late subagent completions go straight to requester-agent/direct handoff instead of producing `reason=no_active_run` queue noise. ([#82964](openclaw/openclaw#82964)) Thanks [@galiniliev](https://github.com/galiniliev). - CLI: retry config snapshot reads after a transient failure so one rejected read no longer poisons later commands in the same process. ([#83931](openclaw/openclaw#83931)) Thanks [@honor2030](https://github.com/honor2030). - Media: decode URL path basenames before using them as remote media fallback filenames, so files like `My%20Report.pdf` are surfaced as `My Report.pdf`. Fixes [#84050](openclaw/openclaw#84050). ([#84052](openclaw/openclaw#84052)) Thanks [@jbetala7](https://github.com/jbetala7). - WhatsApp: clarify inbound group diagnostics so observed but unregistered groups point to `channels.whatsapp.groups` without changing routing or sender authorization. ([#83846](openclaw/openclaw#83846)) Thanks [@neeravmakwana](https://github.com/neeravmakwana). - WhatsApp: drain pending outbound deliveries on a 30s periodic timer in addition to the reconnect handler, so messages enqueued while the provider is already connected no longer wait for the next reconnect to send. ([#79083](openclaw/openclaw#79083)) Thanks [@Oviemudiaga](https://github.com/Oviemudiaga). - CLI/TUI: include gateway plugin slash commands in TUI autocomplete, so connected sessions can suggest plugin-owned commands exposed by the running Gateway. ([#83640](openclaw/openclaw#83640)) Thanks [@se7en-agent](https://github.com/se7en-agent). - Gateway/mobile: restore QR setup-code handoff of bounded operator tokens for iOS and Android onboarding while keeping admin and pairing scopes out of bootstrap. ([#83684](openclaw/openclaw#83684)) Thanks [@ngutman](https://github.com/ngutman). - iOS: repair Release archive compilation for the TestFlight build. ([#84255](openclaw/openclaw#84255)) Thanks [@ngutman](https://github.com/ngutman). - Agents/compaction: bound plugin-owned CLI transcript compaction with the host safety timeout so a hung context engine can no longer stall post-turn cleanup. ([#84083](openclaw/openclaw#84083)) Thanks [@100yenadmin](https://github.com/100yenadmin). - Control UI/usage: truncate long context skill, tool, and file names in the usage panel while keeping the full name available on hover. ([#42197](openclaw/openclaw#42197)) Thanks [@Rain120](https://github.com/Rain120). - Codex: respect explicit `models auth order set` and `config.auth.order` precedence over stale `lastGood` in `/codex account`, and show `no working credential` when every explicit-order profile is ineligible instead of marking a lower-ranked profile as active. Fixes [#84386](openclaw/openclaw#84386). ([#84412](openclaw/openclaw#84412)) Thanks [@openperf](https://github.com/openperf). - Agents: honor `messages.suppressToolErrors` for mutating tool failures so configured chat surfaces do not receive separate warning payloads. ([#81561](openclaw/openclaw#81561)) Thanks [@moeedahmed](https://github.com/moeedahmed). - Agents/fallback: surface billing guidance for mixed rate-limit plus billing fallback exhaustion instead of generic failure copy. Fixes [#79396](openclaw/openclaw#79396). ([#79489](openclaw/openclaw#79489)) Thanks [@aayushprsingh](https://github.com/aayushprsingh). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/615

openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026

Kaspre force-pushed the fix/code-mode-exec-tool-description branch from b47ee0e to 66b2dd6 Compare May 19, 2026 21:22

Kaspre marked this pull request as ready for review May 19, 2026 21:31

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026

test(code-mode): cover exec tool guidance

e75ca8e

Kaspre force-pushed the fix/code-mode-exec-tool-description branch from 66b2dd6 to e75ca8e Compare May 19, 2026 22:05

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026

clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 19, 2026

clawsweeper Bot mentioned this pull request May 20, 2026

fix(code-mode): sharpen exec tool description so models stop wasting turns rediscovering constraints #84368

Merged

clawsweeper Bot closed this May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(code-mode): sharpen exec tool description so models stop wasting turns rediscovering constraints#84269

fix(code-mode): sharpen exec tool description so models stop wasting turns rediscovering constraints#84269
Kaspre wants to merge 2 commits into
openclaw:mainfrom
Kaspre:fix/code-mode-exec-tool-description

Kaspre commented May 19, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Takhoffman commented May 19, 2026

Uh oh!

clawsweeper Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Takhoffman commented May 20, 2026

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Kaspre commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change and value

Who's affected

Why now

Implementation

Real behavior proof

Uh oh!

clawsweeper Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Takhoffman commented May 19, 2026

Uh oh!

clawsweeper Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Takhoffman commented May 20, 2026

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kaspre commented May 19, 2026 •

edited

Loading

clawsweeper Bot commented May 19, 2026 •

edited

Loading

clawsweeper Bot commented May 19, 2026 •

edited

Loading