fix(code-mode): honor agent scoped code mode by Kaspre · Pull Request #83473 · openclaw/openclaw

Kaspre · 2026-05-18T06:32:58Z

Problem

agents.list[].tools.codeMode was documented as a per-agent setting, but main rejected it in the strict per-agent schema. A schema-only fix would be worse than the original rejection because it would accept security/tool-boundary config without making it effective at runtime.

Change and value

Accept agents.list[].tools.codeMode in schema, exported types, labels, and help text.
Resolve code mode from the active agent, with agent settings overriding top-level tools.codeMode defaults.
Wire the active-agent result through catalog activation and model payload enforcement so agent-scoped code mode is not accepted as inert config.
Preserve provider grouped tool payloads, such as Gemini functionDeclarations, while filtering code-mode payloads down to exec/wait.

Who is affected

Operators using per-agent code-mode policy now get the documented behavior. Existing top-level tools.codeMode behavior is preserved. Agents without code mode continue to see the normal tool surface.

Real behavior proof

Behavior or issue addressed: Per-agent agents.list[].tools.codeMode changes the active agent's runtime/model tool surface; it is not merely accepted by schema validation.

Real environment tested: Local source worktree at rebased PR head 22c2376b7260.

Exact steps or command run after this patch: Exercised the embedded-attempt runtime harness that captures the outgoing model request for an active ops agent, then ran a redacted transport harness against PR-head resolveCodeModeConfig and createCodexNativeWebSearchWrapper for ops and chat.

Evidence after fix:

agent=ops resolvedCodeMode=true transportSurface=true payloadTools=exec,wait
agent=chat resolvedCodeMode=false transportSurface=false payloadTools=read,exec,wait
agent=ops resolvedCodeMode=true transportSurface=true payloadTools=exec,wait

Observed result after fix: The enabled ops agent sent only the code-mode payload tools (exec, wait); the disabled chat control kept the normal visible payload; and the grouped declaration case still filtered to exec, wait.

What was not tested: No external provider API request and no gateway process. This proof targets the embedded runner/model-transport boundary where accepted config becomes an outgoing tool payload.

Verification

Current head 22c2376b7260: env OPENCLAW_VITEST_MAX_WORKERS=1 timeout 180s node --no-maglev node_modules/vitest/vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts passed, 41 tests.
Current head 22c2376b7260: redacted model-transport proof harness output shown above.
Current head 22c2376b7260: node --import tsx scripts/generate-config-doc-baseline.ts --check passed.
Current head 22c2376b7260: git diff --check origin/main...HEAD passed.
Earlier focused local Vitest for the unchanged runtime/config code passed: zod-schema.agent-defaults.test.ts, code-mode.test.ts, openai-stream-wrappers.test.ts, and attempt.spawn-workspace.context-engine.test.ts, 116 tests total.
Pre-rebase validation on the same code diff: oxfmt --check --threads=1 <touched files> passed.
Pre-rebase validation on the same code diff: timeout 900s codex review --base origin/main passed with no actionable correctness issues.
External Claude review was triaged: wrapper-stacking concern is rejected because runEmbeddedAttempt rebuilds from the session base stream before wrapping; shallow override and test-support notes are non-blocking for the current config shape/test harness.
Pre-rebase remote GCP Crabbox pnpm check:changed on head 9b9cb2218662 passed, selected core, coreTests, docs, exit 0, lease cbx_b35df657aaf8.

clawsweeper · 2026-05-18T06:34:01Z

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds per-agent tools.codeMode schema/type/help support, resolves active-agent code-mode runtime settings, enforces exec/wait model payload filtering for active code-mode runs, and updates focused tests, generated config baselines, prompt snapshots, and changelog.

Reproducibility: yes. at source level. Current main documents agent-level tools.codeMode, but the strict per-agent schema omits codeMode and runtime resolution still reads only top-level tools.codeMode.

PR rating
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Summary: Good normal PR readiness: the patch is focused, source-reproducible, and backed by targeted transport-boundary proof, with ordinary maintainer review still needed for the security-boundary choice.

Rank-up moves:

none

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body supplies redacted terminal-style after-fix output from runtime/model-transport harnesses showing enabled, disabled, and grouped-declaration code-mode payload behavior.

Risk before merge
Why this matters: - Merging intentionally makes per-agent config capable of exposing the code-mode exec/wait execution surface for selected agents, so maintainers should explicitly accept the security-boundary behavior.

The supplied proof targets the embedded-runner/model-transport boundary and does not include a full gateway process or external provider API request.

Maintainer options:

Accept Transport-Boundary Proof (recommended)
Maintainers can merge if the embedded-runner/model-transport proof is sufficient for this per-agent code-mode execution-surface change.
Request A Live Runtime Trace
If the security bar needs stronger evidence, ask for a redacted gateway or live-provider run showing the selected agent receives only exec/wait while a control agent does not.

Next step before merge
No repair lane is needed; maintainers need to decide whether the supplied transport-boundary proof is enough for this security-sensitive code-mode change.

Security
Cleared: No concrete supply-chain, secret-handling, or unintended permission-expansion defect was found; the intended code-mode security boundary remains a maintainer merge-risk decision.

Review details

Best possible solution:

Land this PR or an equivalent focused fix after maintainers accept the supplied transport-boundary proof for the code-mode security boundary.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main documents agent-level tools.codeMode, but the strict per-agent schema omits codeMode and runtime resolution still reads only top-level tools.codeMode.

Is this the best way to solve the issue?

Yes. The PR fixes schema/types/help and carries the active-agent code-mode decision through catalog and model-payload enforcement, which is safer than the schema-only path previously proposed in #83394.

Label justifications:

P2: This is a normal-priority documented config/runtime mismatch with focused scope, but it touches code-mode execution tooling.
merge-risk: 🚨 security-boundary: The diff changes which per-agent config path can expose the model-visible exec/wait execution surface.

Acceptance criteria:

env OPENCLAW_VITEST_MAX_WORKERS=1 timeout 180s node --no-maglev node_modules/vitest/vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts
node --import tsx scripts/generate-config-doc-baseline.ts --check
git diff --check origin/main...HEAD
Optional stronger proof: redacted gateway or live-provider run showing selected agent payload tools are exec,wait while a control agent keeps normal tools

What I checked:

Current docs promise agent-level code mode: Current main says to add tools.codeMode.enabled: true to the agent or runtime config, making per-agent behavior part of the documented contract. Public docs: docs/reference/code-mode.md. (docs/reference/code-mode.md:67, 57c952f67985)
Current agent schema rejects codeMode: On current main, AgentToolsSchema is strict and lists common policy fields plus elevated/exec/fs/loop/message/sandbox without codeMode. (src/config/zod-schema.agent-runtime.ts:703, 57c952f67985)
Current runtime only reads top-level code mode: Current main resolves code mode from config.tools.codeMode only, so an agent-level config would not activate code mode at runtime even if schema accepted it. (src/agents/code-mode.ts:124, 57c952f67985)
PR adds per-agent schema support: The PR head adds codeMode: CodeModeSchema to AgentToolsSchema, preserving strict nested validation while accepting the documented key. (src/config/zod-schema.agent-runtime.ts:706, e7ab68b7d945)
PR resolves active-agent override: The PR head normalizes top-level and agent-level code-mode config and merges the active agent override over the global default. (src/agents/code-mode.ts:136, e7ab68b7d945)
PR threads active agent into embedded runs: The embedded runner now calls resolveCodeModeConfig(params.config, sessionAgentId) and wraps the active stream when code mode is enabled for that run. (src/agents/pi-embedded-runner/run/attempt.ts:1326, e7ab68b7d945)

Likely related people:

steipete: Related merged code-mode work in [codex] Add generic OpenClaw code mode #80600 and local history touch the core code-mode runtime, which is the central surface of this PR. (role: feature owner / recent area contributor; confidence: high; commits: 0db097936568, 46bad8676c00; files: src/agents/code-mode.ts, docs/reference/code-mode.md, src/config/zod-schema.agent-runtime.ts)
vincentkoc: Recent provider stream-family refactor history touches the OpenAI stream wrapper path used for code-mode transport-surface enforcement. (role: adjacent owner; confidence: medium; commits: 8f7b02e5670d; files: src/agents/pi-embedded-runner/openai-stream-wrappers.ts, src/plugin-sdk/provider-stream.ts)
Christof Salis: History for createCodexNativeWebSearchWrapper shows this person introduced the Codex native web-search wrapper that this PR extends for code-mode payload enforcement. (role: introduced adjacent behavior; confidence: medium; commits: 797a70fd95f0; files: src/agents/pi-embedded-runner/openai-stream-wrappers.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 57c952f67985.

steipete · 2026-05-18T11:26:29Z

Maintainer verification for head 0baca59fdcdd7dcea81319cc9b65cbe478903321:

Rebased on current origin/main and force-pushed with lease.
Local focused tests: node scripts/run-vitest.mjs src/agents/code-mode.test.ts src/agents/pi-embedded-runner/openai-stream-wrappers.test.ts src/config/zod-schema.agent-defaults.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts -> 6 files / 181 tests passed.
Config docs baseline: pnpm config:docs:gen then node --import tsx scripts/generate-config-doc-baseline.ts --check -> OK.
Prompt snapshots: pnpm prompt:snapshots:check -> OK after syncing Codex code-mode fixture drift from the current base.
Media CI unblocker proof: node scripts/run-vitest.mjs src/media/image-ops.tempdir.test.ts src/media/image-ops.input-guard.test.ts -> 2 files / 11 tests passed; pnpm deadcode:dependencies && pnpm deadcode:unused-files -> OK.
Build: pnpm build -> passed after the final fixup.
Autoreview: AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1 codex review --base origin/main -> no actionable correctness issues found.
GitHub CI for head 0baca59fdcdd7dcea81319cc9b65cbe478903321: green/neutral/skipped, including CI run 26030218924, CodeQL Critical Quality run 26030218835, CodeQL/Security High run 26030218845, Workflow Sanity run 26030218871, Real behavior proof runs 26030217152 and 26030229217, OpenGrep run 26030218980, and dependency awareness run 26030217166.

Known note: while rebasing onto latest main, two unrelated media fallouts from 57c952f679 blocked the PR merge ref; I included the narrow fix here so this PR can land green.

openclaw-barnacle Bot added agents Agent runtime and tooling size: M triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 18, 2026

Kaspre marked this pull request as ready for review May 18, 2026 07:40

Kaspre force-pushed the fix/agent-code-mode-runtime branch from 3a9f308 to 8e4fc04 Compare May 18, 2026 07:44

openclaw-barnacle Bot added docs Improvements or additions to documentation proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 18, 2026

Kaspre force-pushed the fix/agent-code-mode-runtime branch from 9b9cb22 to 22c2376 Compare May 18, 2026 09:01

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026

steipete self-assigned this May 18, 2026

steipete force-pushed the fix/agent-code-mode-runtime branch from 22c2376 to 84d82e7 Compare May 18, 2026 10:50

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026

Kaspre and others added 5 commits May 18, 2026 12:00

fix(code-mode): honor agent code mode config

6985f30

test(code-mode): cover agent runtime payload

6a718b3

fix(code-mode): preserve grouped payload tools

3205f4e

docs(config): refresh config baseline

068ed57

docs(changelog): note agent code mode fix

57ee877

test(codex): sync code mode prompt snapshots

e7ab68b

steipete force-pushed the fix/agent-code-mode-runtime branch from 84d82e7 to e7ab68b Compare May 18, 2026 11:09

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026

fix(media): repair image CI fallout

0baca59

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026

steipete merged commit fd8877b into openclaw:main May 18, 2026
101 checks passed

steipete mentioned this pull request May 18, 2026

code-mode: AgentToolsSchema rejects "codeMode" key but docs say agent-level config works #83388

Closed

Kaspre mentioned this pull request May 19, 2026

fix(code-mode): sharpen exec tool description so models stop wasting turns rediscovering constraints #84269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(code-mode): honor agent scoped code mode#83473

fix(code-mode): honor agent scoped code mode#83473
steipete merged 7 commits into
openclaw:mainfrom
Kaspre:fix/agent-code-mode-runtime

Kaspre commented May 18, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 18, 2026 •

edited

Loading

Uh oh!

steipete commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Kaspre commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change and value

Who is affected

Real behavior proof

Verification

Uh oh!

clawsweeper Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steipete commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kaspre commented May 18, 2026 •

edited

Loading

clawsweeper Bot commented May 18, 2026 •

edited

Loading