Skip to content

fix(codex): quarantine unreadable dynamic tool descriptors#89397

Closed
vincentkoc wants to merge 1 commit into
mainfrom
fuzz-tool-schema-surfaces-next13-20260602
Closed

fix(codex): quarantine unreadable dynamic tool descriptors#89397
vincentkoc wants to merge 1 commit into
mainfrom
fuzz-tool-schema-surfaces-next13-20260602

Conversation

@vincentkoc

@vincentkoc vincentkoc commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

  • Quarantines unreadable Codex app-server dynamic tool descriptors before bridge registration builds specs, maps, or wrappers.
  • Snapshots readable descriptor fields once so a plugin getter cannot pass schema projection and crash later during Codex tool spec construction.
  • Preserves healthy sibling tools and existing dynamic-tool quarantine warnings/diagnostics.
  • Out of scope: changing plugin SDK registration contracts or provider-side schema normalization.

Linked context

Related to the active runtime tool-schema fuzzing sweep for unsupported plugin tool schemas crashing turns before assistant content.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: A malformed Codex dynamic tool descriptor can no longer crash bridge registration before the assistant turn reaches content.
  • Real environment tested: Local OpenClaw Codex extension Vitest harness in a Codex worktree.
  • Exact steps or command run after this patch: node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot; AWS Crabbox fresh-PR pnpm check:changed
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): extensions/codex/src/app-server/dynamic-tools.test.ts passed, 37 tests.
  • Observed result after fix: unreadable dynamic tool name, parameters, and description descriptors are quarantined; a one-shot readable name is captured once and reused without a later getter crash; healthy message remains registered.
  • What was not tested: live Telegram/Codex app-server turn with an installed hostile plugin.
  • Proof limitations or environment constraints: local disk is very low, so broad changed proof ran remotely instead of in the local Codex worktree.
  • Before evidence (optional but encouraged): autoreview confirmed the earlier guarded projection still crashed when downstream registration reread raw descriptors.

Tests and validation

  • ./node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
  • git diff --check origin/main...HEAD
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot
  • .agents/skills/autoreview/scripts/autoreview --mode local
  • AWS Crabbox fresh-PR changed gate:
    /opt/homebrew/bin/crabbox run --provider aws --fresh-pr openclaw/openclaw#89397 --idle-timeout 30m --ttl 90m --timing-json --shell -- "set -e; if ! command -v node >/dev/null 2>&1; then curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -; sudo apt-get install -y nodejs; fi; node -v; npm -v; sudo corepack enable; env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed" passed. Provider: aws; run: run_b533b3b7e58c; lease: cbx_8180c941abce; slug: brisk-krill; exit 0; lanes: extensions, extensionTests; command 4m26.143s; total 4m42.141s; lease stopped.

Regression coverage added:

  • createCodexDynamicToolBridge(...) quarantines unreadable dynamic tool descriptor fields before bridge registration.
  • The same regression verifies a one-shot dynamic tool name is not reread during spec construction.

Known validation note:

  • Targeted local extension oxlint currently reports pre-existing sparse/type-resolution errors in extensions/codex/src/app-server/dynamic-tools.ts, so lint/typecheck proof is being routed through remote changed validation.

Risk checklist

  • Did user-visible behavior change? (Yes/No) Yes.
  • Did config, environment, or migration behavior change? (Yes/No) No.
  • Did security, auth, secrets, network, or tool execution behavior change? (Yes/No) Yes, tool registration now quarantines malformed dynamic descriptors earlier.
  • What is the highest-risk area? Accidentally changing Codex dynamic tool registration/wrapping semantics.
  • How is that risk mitigated? Existing dynamic-tool tests still pass, including direct/searchable registration and wrapped-tool behavior; the fix snapshots only registration descriptors and keeps execution dispatch tied to the captured name.

Current review state

  • Next action: maintainer review.
  • Still waiting on: nothing known from author.
  • Bot or reviewer comments addressed: local autoreview first found the raw-descriptor reread issue; the patch now snapshots descriptor fields and final autoreview is clean.

@vincentkoc vincentkoc self-assigned this Jun 2, 2026
@clawsweeper

clawsweeper Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge. Reviewed June 2, 2026, 5:29 AM ET / 09:29 UTC.

Summary
The PR snapshots Codex dynamic tool descriptor fields before bridge registration and adds regression coverage for unreadable descriptor getters.

PR surface: Source +121, Tests +84. Total +205 across 2 files.

Reproducibility: yes. from source inspection: current main reads dynamic tool descriptor getters directly during projection/spec construction, so throwing getters can abort bridge registration. I did not run the targeted Vitest in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🌊 off-meta tidepool
Patch quality: 🦐 gold shrimp
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Preserve original tool metadata through descriptor snapshotting and wrapping.
  • [P2] Add a regression for plugin owner diagnostics or code-mode marker preservation, then rerun the targeted Codex dynamic-tools test.

Risk before merge

  • [P1] Merging as-is can make unwrapped plugin or channel dynamic tools emit tool.execution diagnostics as core tools, and code-mode control tools can lose their WeakSet marker before before_tool_call normalization.

Maintainer options:

  1. Preserve metadata before merge (recommended)
    Keep the descriptor snapshots for Codex specs, but ensure the object passed through before_tool_call wrapping retains plugin/channel/code-mode metadata and add focused tests for that behavior.
  2. Pause until owner review
    If maintainers want a different tool snapshot contract, pause this PR until the Codex/plugin runtime owner confirms the permanent metadata-preservation shape.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Preserve plugin, channel, and code-mode tool metadata when Codex dynamic tool descriptor snapshots are wrapped for execution; add regression coverage for plugin owner diagnostics and code-mode control marker preservation; rerun the Codex dynamic-tools targeted test.

Next step before merge

  • [P2] There is a narrow, actionable PR-branch repair: preserve object-keyed tool metadata while keeping descriptor snapshots for Codex spec construction.

Security
Needs attention: The patch touches tool execution registration and currently risks dropping owner/kind metadata used by security-relevant diagnostics and hook normalization.

Review findings

  • [P2] Preserve tool metadata when wrapping snapshots — extensions/codex/src/app-server/dynamic-tools.ts:115-117
Review details

Best possible solution:

Keep the early descriptor quarantine, but preserve execution-object metadata and add a regression that proves plugin/channel owner diagnostics and code-mode marker behavior survive descriptor snapshotting.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection: current main reads dynamic tool descriptor getters directly during projection/spec construction, so throwing getters can abort bridge registration. I did not run the targeted Vitest in this read-only review.

Is this the best way to solve the issue?

No as written: the quarantine belongs at the Codex bridge boundary, but the execution snapshot must preserve metadata that current wrappers derive from the original tool object.

Full review comments:

  • [P2] Preserve tool metadata when wrapping snapshots — extensions/codex/src/app-server/dynamic-tools.ts:115-117
    The new snapshot object is what gets passed into wrapToolWithBeforeToolCallHook, but wrapper behavior depends on metadata stored on the original tool object. Plugin/channel owner metadata is WeakMap-backed and code-mode control markers are WeakSet-backed, so wrapping the fresh snapshot can reclassify plugin tools as core in tool.execution.* diagnostics and skip code-mode exec param normalization compared with current main wrapping the original tool. Preserve or copy that metadata before wrapping, or keep the original execution object and use the snapshot only for Codex spec fields.
    Confidence: 0.87

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 601ab84f35e1.

Label changes

Label changes:

  • add P2: The PR addresses a real Codex dynamic-tool crash path, but the remaining blocker has limited blast radius and is confined to tool registration/wrapping behavior.
  • add merge-risk: 🚨 security-boundary: The diff can drop tool owner and code-mode execution metadata before the before_tool_call/diagnostic wrapper observes the tool.
  • add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The external-contributor proof gate does not apply to this maintainer PR; the body reports targeted Vitest and Crabbox changed-gate validation as supplemental proof.

Label justifications:

  • P2: The PR addresses a real Codex dynamic-tool crash path, but the remaining blocker has limited blast radius and is confined to tool registration/wrapping behavior.
  • merge-risk: 🚨 security-boundary: The diff can drop tool owner and code-mode execution metadata before the before_tool_call/diagnostic wrapper observes the tool.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The external-contributor proof gate does not apply to this maintainer PR; the body reports targeted Vitest and Crabbox changed-gate validation as supplemental proof.
Evidence reviewed

PR surface:

Source +121, Tests +84. Total +205 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 142 21 +121
Tests 1 84 0 +84
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 226 21 +205

Security concerns:

  • [medium] Tool identity metadata can be dropped — extensions/codex/src/app-server/dynamic-tools.ts:115
    The execution wrapper is built around a fresh snapshot object, while plugin ownership and code-mode-control state are object-keyed metadata on the original tool; losing them can weaken audit classification and code-mode before_tool_call normalization.
    Confidence: 0.84

Acceptance criteria:

  • [P1] node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot.
  • [P1] node scripts/run-vitest.mjs extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts -- --reporter=dot.

What I checked:

Likely related people:

  • WJzz1: Current blame attributes the Codex dynamic tool bridge and wrapper/helper lines in this checkout snapshot to this author, making them a useful routing candidate for the immediate surface. (role: recent area contributor; confidence: medium; commits: 6349af650240; files: extensions/codex/src/app-server/dynamic-tools.ts, src/agents/agent-tools.before-tool-call.ts)
  • steipete: Git history shows the Codex app-server dynamic-tools module was added/moved in a recent Codex app-server commit by Peter Steinberger. (role: Codex app-server feature history; confidence: medium; commits: d5698038d71c; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/run-attempt.ts)
  • vincentkoc: Prior merged history includes plugin provenance and live hook registry fixes, which are directly adjacent to preserving tool identity metadata across runtime wrapping. (role: adjacent plugin/runtime provenance owner; confidence: medium; commits: 982383338373, d22279d2e873; files: src/plugins/tools.ts, src/plugins/runtime.ts, src/plugins/hooks.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 2, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

Closing this as folded into the landed canonical PR: #90022

Reason: this was an intermediate schema-surface draft for the same issue. The merged PR is the cleaner superset: it preserves healthy tool execution, quarantines unreadable/invalid descriptor and wrapper failures, and prevents Codex from receiving registered durable specs for tools that are unavailable in the current turn.

Landed squash commit: 3ffb360

@vincentkoc vincentkoc closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex maintainer Maintainer-authored PR rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant