Skip to content

fix(code-mode): honor agent scoped code mode#83473

Merged
steipete merged 7 commits into
openclaw:mainfrom
Kaspre:fix/agent-code-mode-runtime
May 18, 2026
Merged

fix(code-mode): honor agent scoped code mode#83473
steipete merged 7 commits into
openclaw:mainfrom
Kaspre:fix/agent-code-mode-runtime

Conversation

@Kaspre

@Kaspre Kaspre commented May 18, 2026

Copy link
Copy Markdown
Contributor

Fixes #83388.

Problem

agents.list[].tools.codeMode was documented as a per-agent setting, but main rejected it in the strict per-agent schema. A schema-only fix would be worse than the original rejection because it would accept security/tool-boundary config without making it effective at runtime.

Change and value

  • Accept agents.list[].tools.codeMode in schema, exported types, labels, and help text.
  • Resolve code mode from the active agent, with agent settings overriding top-level tools.codeMode defaults.
  • Wire the active-agent result through catalog activation and model payload enforcement so agent-scoped code mode is not accepted as inert config.
  • Preserve provider grouped tool payloads, such as Gemini functionDeclarations, while filtering code-mode payloads down to exec/wait.

Who is affected

Operators using per-agent code-mode policy now get the documented behavior. Existing top-level tools.codeMode behavior is preserved. Agents without code mode continue to see the normal tool surface.

Real behavior proof

Behavior or issue addressed: Per-agent agents.list[].tools.codeMode changes the active agent's runtime/model tool surface; it is not merely accepted by schema validation.

Real environment tested: Local source worktree at rebased PR head 22c2376b7260.

Exact steps or command run after this patch: Exercised the embedded-attempt runtime harness that captures the outgoing model request for an active ops agent, then ran a redacted transport harness against PR-head resolveCodeModeConfig and createCodexNativeWebSearchWrapper for ops and chat.

Evidence after fix:

agent=ops resolvedCodeMode=true transportSurface=true payloadTools=exec,wait
agent=chat resolvedCodeMode=false transportSurface=false payloadTools=read,exec,wait
agent=ops resolvedCodeMode=true transportSurface=true payloadTools=exec,wait

Observed result after fix: The enabled ops agent sent only the code-mode payload tools (exec, wait); the disabled chat control kept the normal visible payload; and the grouped declaration case still filtered to exec, wait.

What was not tested: No external provider API request and no gateway process. This proof targets the embedded runner/model-transport boundary where accepted config becomes an outgoing tool payload.

Verification

  • Current head 22c2376b7260: env OPENCLAW_VITEST_MAX_WORKERS=1 timeout 180s node --no-maglev node_modules/vitest/vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts passed, 41 tests.
  • Current head 22c2376b7260: redacted model-transport proof harness output shown above.
  • Current head 22c2376b7260: node --import tsx scripts/generate-config-doc-baseline.ts --check passed.
  • Current head 22c2376b7260: git diff --check origin/main...HEAD passed.
  • Earlier focused local Vitest for the unchanged runtime/config code passed: zod-schema.agent-defaults.test.ts, code-mode.test.ts, openai-stream-wrappers.test.ts, and attempt.spawn-workspace.context-engine.test.ts, 116 tests total.
  • Pre-rebase validation on the same code diff: oxfmt --check --threads=1 <touched files> passed.
  • Pre-rebase validation on the same code diff: timeout 900s codex review --base origin/main passed with no actionable correctness issues.
  • External Claude review was triaged: wrapper-stacking concern is rejected because runEmbeddedAttempt rebuilds from the session base stream before wrapping; shallow override and test-support notes are non-blocking for the current config shape/test harness.
  • Pre-rebase remote GCP Crabbox pnpm check:changed on head 9b9cb2218662 passed, selected core, coreTests, docs, exit 0, lease cbx_b35df657aaf8.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 18, 2026
@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds per-agent tools.codeMode schema/type/help support, resolves active-agent code-mode runtime settings, enforces exec/wait model payload filtering for active code-mode runs, and updates focused tests, generated config baselines, prompt snapshots, and changelog.

Reproducibility: yes. at source level. Current main documents agent-level tools.codeMode, but the strict per-agent schema omits codeMode and runtime resolution still reads only top-level tools.codeMode.

PR rating
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Summary: Good normal PR readiness: the patch is focused, source-reproducible, and backed by targeted transport-boundary proof, with ordinary maintainer review still needed for the security-boundary choice.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body supplies redacted terminal-style after-fix output from runtime/model-transport harnesses showing enabled, disabled, and grouped-declaration code-mode payload behavior.

Risk before merge
Why this matters: - Merging intentionally makes per-agent config capable of exposing the code-mode exec/wait execution surface for selected agents, so maintainers should explicitly accept the security-boundary behavior.

  • The supplied proof targets the embedded-runner/model-transport boundary and does not include a full gateway process or external provider API request.

Maintainer options:

  1. Accept Transport-Boundary Proof (recommended)
    Maintainers can merge if the embedded-runner/model-transport proof is sufficient for this per-agent code-mode execution-surface change.
  2. Request A Live Runtime Trace
    If the security bar needs stronger evidence, ask for a redacted gateway or live-provider run showing the selected agent receives only exec/wait while a control agent does not.

Next step before merge
No repair lane is needed; maintainers need to decide whether the supplied transport-boundary proof is enough for this security-sensitive code-mode change.

Security
Cleared: No concrete supply-chain, secret-handling, or unintended permission-expansion defect was found; the intended code-mode security boundary remains a maintainer merge-risk decision.

Review details

Best possible solution:

Land this PR or an equivalent focused fix after maintainers accept the supplied transport-boundary proof for the code-mode security boundary.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main documents agent-level tools.codeMode, but the strict per-agent schema omits codeMode and runtime resolution still reads only top-level tools.codeMode.

Is this the best way to solve the issue?

Yes. The PR fixes schema/types/help and carries the active-agent code-mode decision through catalog and model-payload enforcement, which is safer than the schema-only path previously proposed in #83394.

Label justifications:

  • P2: This is a normal-priority documented config/runtime mismatch with focused scope, but it touches code-mode execution tooling.
  • merge-risk: 🚨 security-boundary: The diff changes which per-agent config path can expose the model-visible exec/wait execution surface.

Acceptance criteria:

  • env OPENCLAW_VITEST_MAX_WORKERS=1 timeout 180s node --no-maglev node_modules/vitest/vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts
  • node --import tsx scripts/generate-config-doc-baseline.ts --check
  • git diff --check origin/main...HEAD
  • Optional stronger proof: redacted gateway or live-provider run showing selected agent payload tools are exec,wait while a control agent keeps normal tools

What I checked:

Likely related people:

  • steipete: Related merged code-mode work in [codex] Add generic OpenClaw code mode #80600 and local history touch the core code-mode runtime, which is the central surface of this PR. (role: feature owner / recent area contributor; confidence: high; commits: 0db097936568, 46bad8676c00; files: src/agents/code-mode.ts, docs/reference/code-mode.md, src/config/zod-schema.agent-runtime.ts)
  • vincentkoc: Recent provider stream-family refactor history touches the OpenAI stream wrapper path used for code-mode transport-surface enforcement. (role: adjacent owner; confidence: medium; commits: 8f7b02e5670d; files: src/agents/pi-embedded-runner/openai-stream-wrappers.ts, src/plugin-sdk/provider-stream.ts)
  • Christof Salis: History for createCodexNativeWebSearchWrapper shows this person introduced the Codex native web-search wrapper that this PR extends for code-mode payload enforcement. (role: introduced adjacent behavior; confidence: medium; commits: 797a70fd95f0; files: src/agents/pi-embedded-runner/openai-stream-wrappers.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 57c952f67985.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. labels May 18, 2026
@Kaspre Kaspre marked this pull request as ready for review May 18, 2026 07:40
@Kaspre Kaspre force-pushed the fix/agent-code-mode-runtime branch from 3a9f308 to 8e4fc04 Compare May 18, 2026 07:44
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 18, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 18, 2026
@Kaspre Kaspre force-pushed the fix/agent-code-mode-runtime branch from 9b9cb22 to 22c2376 Compare May 18, 2026 09:01
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@steipete steipete self-assigned this May 18, 2026
@steipete steipete force-pushed the fix/agent-code-mode-runtime branch from 22c2376 to 84d82e7 Compare May 18, 2026 10:50
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@steipete steipete force-pushed the fix/agent-code-mode-runtime branch from 84d82e7 to e7ab68b Compare May 18, 2026 11:09
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 18, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer verification for head 0baca59fdcdd7dcea81319cc9b65cbe478903321:

  • Rebased on current origin/main and force-pushed with lease.
  • Local focused tests: node scripts/run-vitest.mjs src/agents/code-mode.test.ts src/agents/pi-embedded-runner/openai-stream-wrappers.test.ts src/config/zod-schema.agent-defaults.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts -> 6 files / 181 tests passed.
  • Config docs baseline: pnpm config:docs:gen then node --import tsx scripts/generate-config-doc-baseline.ts --check -> OK.
  • Prompt snapshots: pnpm prompt:snapshots:check -> OK after syncing Codex code-mode fixture drift from the current base.
  • Media CI unblocker proof: node scripts/run-vitest.mjs src/media/image-ops.tempdir.test.ts src/media/image-ops.input-guard.test.ts -> 2 files / 11 tests passed; pnpm deadcode:dependencies && pnpm deadcode:unused-files -> OK.
  • Build: pnpm build -> passed after the final fixup.
  • Autoreview: AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1 codex review --base origin/main -> no actionable correctness issues found.
  • GitHub CI for head 0baca59fdcdd7dcea81319cc9b65cbe478903321: green/neutral/skipped, including CI run 26030218924, CodeQL Critical Quality run 26030218835, CodeQL/Security High run 26030218845, Workflow Sanity run 26030218871, Real behavior proof runs 26030217152 and 26030229217, OpenGrep run 26030218980, and dependency awareness run 26030217166.

Known note: while rebasing onto latest main, two unrelated media fallouts from 57c952f679 blocked the PR merge ref; I included the narrow fix here so this PR can land green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

code-mode: AgentToolsSchema rejects "codeMode" key but docs say agent-level config works

2 participants