Skip to content

fix(codex): quarantine unreadable dynamic tools#90022

Merged
vincentkoc merged 1 commit into
mainfrom
fuzz-tool-schema-next147-20260603
Jun 8, 2026
Merged

fix(codex): quarantine unreadable dynamic tools#90022
vincentkoc merged 1 commit into
mainfrom
fuzz-tool-schema-next147-20260603

Conversation

@vincentkoc

Copy link
Copy Markdown
Member

Summary

  • Quarantine Codex dynamic tools whose descriptor fields are unreadable or invalid before dynamic tool spec generation.
  • Keep healthy sibling tools registered when one plugin tool has a poisoned name, parameters, or wrapping-time execute getter.
  • Filter durable specs so tools quarantined from the available executable set are not still advertised to Codex.

Verification

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts extensions/codex/src/app-server/dynamic-tool-build.test.ts
  • ./node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
  • ./node_modules/.bin/oxlint extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
  • git diff --check origin/main...HEAD
  • .agents/skills/autoreview/scripts/autoreview --mode local
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
  • node scripts/crabbox-wrapper.mjs run --shell -- env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed blocked before execution: coordinator HTTP 401 unauthorized, provider azure, slug crimson-barnacle.

Real behavior proof

Behavior addressed: A malformed Codex dynamic tool descriptor can no longer crash dynamic tool bridge creation before the assistant turn gets content; the bad tool is quarantined and healthy sibling tools remain available.

Real environment tested: Local linked OpenClaw worktree using the repo Vitest wrapper for the Codex extension app-server tests; broad remote changed gate attempted through the Crabbox wrapper.

Exact steps or command run after this patch: Ran the focused Codex dynamic tool bridge tests, the adjacent dynamic-tool-build tests, formatter check, oxlint, diff check, local autoreview, branch autoreview, and the Crabbox changed-gate command listed above.

Evidence after fix: Focused Vitest passed 2 files / 66 tests; formatter, lint, diff check, local autoreview, and branch autoreview all passed.

Observed result after fix: Poisoned dynamic tool name, parameters, invalid empty name, and wrapping-time execute getters are quarantined; only the healthy message tool remains in availableSpecs and durable specs.

What was not tested: Full pnpm check:changed did not execute because the Crabbox coordinator rejected run/lease creation with HTTP 401. No live Codex assistant turn was run.

@vincentkoc vincentkoc self-assigned this Jun 3, 2026
@clawsweeper

clawsweeper Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 7, 2026, 9:22 PM ET / 01:22 UTC.

Summary
The PR caches readable Codex dynamic-tool descriptor fields, quarantines unreadable or invalid tools during projection/wrapping, filters durable specs for unavailable tools, and adds regression coverage.

PR surface: Source +133, Tests +110. Total +243 across 2 files.

Reproducibility: yes. from source: current main directly reads Codex dynamic-tool descriptor fields during projection/spec creation, so poisoned getters can throw before bridge creation completes. I did not execute tests in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Risk before merge

  • [P1] No broad changed gate or live Codex app-server turn is reported; the focused bridge tests are strong, but maintainers may want runtime proof before undrafting.

Maintainer options:

  1. Decide the mitigation before merge
    Land the bridge-level quarantine after maintainer review and any desired broad/live Codex proof, keeping malformed plugin tools out of Codex specs while healthy sibling tools remain available.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] This protected maintainer draft has no narrow automated repair target; the next action is maintainer review and any desired broad/live Codex proof before undrafting.

Security
Cleared: The diff only changes Codex dynamic-tool bridge logic and tests, with no dependency, workflow, package, secret, or artifact execution surface changes.

Review details

Best possible solution:

Land the bridge-level quarantine after maintainer review and any desired broad/live Codex proof, keeping malformed plugin tools out of Codex specs while healthy sibling tools remain available.

Do we have a high-confidence way to reproduce the issue?

Yes from source: current main directly reads Codex dynamic-tool descriptor fields during projection/spec creation, so poisoned getters can throw before bridge creation completes. I did not execute tests in this read-only review.

Is this the best way to solve the issue?

Yes. The patch fixes the bridge boundary by caching validated descriptor data, quarantining unreadable/wrapping failures, and preventing Codex from seeing tools that are unavailable to execute; relying only on build-stage filtering would leave direct bridge callers less protected.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5c5391836b1e.

Label changes

Label justifications:

  • P2: This is a normal-priority Codex dynamic-tool runtime hardening with focused tests and limited blast radius.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate is not applied because this is a maintainer-member PR; the body reports focused tests/lint/autoreview and explicitly notes no live Codex turn ran.
Evidence reviewed

PR surface:

Source +133, Tests +110. Total +243 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 170 37 +133
Tests 1 110 0 +110
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 280 37 +243

What I checked:

Likely related people:

  • vincentkoc: Authored the prior Codex dynamic-tool schema quarantine and diagnostics commits and also authored this PR's focused follow-up in the same bridge/tests. (role: feature owner and recent area contributor; confidence: high; commits: d7d037b46f32, da5fe990d8b3, 26135a1bc9d0; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/dynamic-tools.test.ts, src/agents/tool-schema-projection.ts)
  • steipete: Current local blame points at a recent broad refactor touching the Codex app-server bridge files, and GitHub history shows a recent docs pass for the Codex dynamic-tool bridge. (role: recent area contributor; confidence: medium; commits: ce015cef5775, 7139f473332f; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/side-question.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. labels Jun 3, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@vincentkoc

Copy link
Copy Markdown
Member Author

keeper decision for this duplicate family: keep and land #90022.

Why #90022 over #89481:

  • keeps the descriptor quarantine behavior from fix(codex): quarantine unreadable dynamic tools #89481 for unreadable/invalid dynamic tool descriptors
  • also quarantines wrapping-time failures from wrapToolWithBeforeToolCallHook
  • filters registered durable specs so Codex is not advertised tools that are unavailable to execute in the current turn

Proof before undraft/merge:

  • checked Codex app-server protocol/source directly: dynamic tool spec shape plus validate_dynamic_tools on thread/start
  • checked OpenClaw runtime caller: Codex run-attempt builds the bridge from current-turn tools plus registered durable tools
  • git diff --check origin/main...HEAD
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot => 37 tests passed
  • Blacksmith Testbox scoped changed gate after repairing shallow merge-base: changed paths were only extensions/codex/src/app-server/dynamic-tools.ts and .test.ts; lanes were extensions, extensionTests; extension typecheck, extension-test typecheck, extension lint, media helper guard, sidecar loader guard, and runtime import-cycle check all passed
  • fresh ClawSweeper re-review completed against current main 5c5391836b1e: ready for maintainer review, no narrow actionable defect

After this lands I am folding #88774, #89063, #89397, and #89481 into this canonical PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex maintainer Maintainer-authored PR P2 Normal backlog priority with limited blast radius. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant