fix(codex): quarantine unreadable dynamic tools by vincentkoc · Pull Request #89481 · openclaw/openclaw

vincentkoc · 2026-06-02T14:11:09Z

Summary

Snapshot Codex dynamic tool names, descriptions, and schemas before wrapping or emitting specs.
Quarantine unreadable dynamic tool entries/descriptors instead of aborting Codex bridge construction.
Keep healthy sibling dynamic tools available when one plugin tool has a hostile descriptor or list entry.

Verification

node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot
./node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
./node_modules/.bin/oxlint extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
git diff --check
.agents/skills/autoreview/scripts/autoreview --mode local --no-web-search
AWS Crabbox fresh-PR changed gate: run_8adb0867537d, lease cbx_5a28a16938b0, provider aws, command corepack pnpm check:changed, exit 0

Real behavior proof

Behavior addressed: Codex app-server dynamic tool bridge no longer crashes before content when a dynamic tool list entry or descriptor field is unreadable.
Real environment tested: local macOS checkout with Node 24 via repo Vitest wrapper; AWS Crabbox fresh PR checkout on Linux Node 24.15.0.
Exact steps or command run after this patch: node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot; AWS Crabbox corepack pnpm check:changed on PR #89481.
Evidence after fix: focused Codex dynamic-tools suite passed 38 tests; AWS Crabbox run run_8adb0867537d passed changed lanes extensions, extensionTests.
Observed result after fix: malformed dynamic tools are quarantined, warnings/diagnostics name the offending tool or index, healthy sibling specs remain available, and healthy tool calls still execute.
What was not tested: full repository test suite.

clawsweeper · 2026-06-02T14:12:50Z

Codex review: needs maintainer review before merge. Reviewed June 2, 2026, 10:27 AM ET / 14:27 UTC.

Summary
This PR changes the Codex app-server dynamic tool bridge to snapshot tool descriptors, quarantine unreadable entries or fields, and add regression tests that keep healthy sibling tools available.

PR surface: Source +140, Tests +154. Total +294 across 2 files.

Reproducibility: yes. Current main directly reads dynamic tool descriptor fields during projection/spec creation, and the PR adds focused tests with throwing getters that exercise that source path.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Next step before merge

No ClawSweeper repair lane is needed because this read-only review found no narrow actionable defect; remaining handling is normal maintainer draft and gate completion.

Security
Cleared: The diff only changes Codex dynamic-tool runtime handling and colocated tests, with no dependency, workflow, secret, or package-resolution surface added.

Review details

Best possible solution:

Keep the bridge-level descriptor quarantine and land it after normal maintainer draft/CI gates if maintainers agree with strict quarantine for unreadable descriptors.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main directly reads dynamic tool descriptor fields during projection/spec creation, and the PR adds focused tests with throwing getters that exercise that source path.

Is this the best way to solve the issue?

Yes. The bridge is the narrow owner boundary because it is where OpenClaw turns runtime tools into Codex specs and wraps executable tools; build-time schema filtering already covers some parameter cases but not unreadable descriptions or final spec emission.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c8d21fe7f05b.

Label changes

Label changes:

add proof: sufficient: Contributor real behavior proof is sufficient. The PR body records after-fix focused runtime test output and an AWS Crabbox changed gate run for the PR head; no contributor action is needed for this maintainer PR.
add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

P2: This is a normal-priority Codex runtime hardening fix with limited blast radius around dynamic tool bridge construction.
rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body records after-fix focused runtime test output and an AWS Crabbox changed gate run for the PR head; no contributor action is needed for this maintainer PR.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body records after-fix focused runtime test output and an AWS Crabbox changed gate run for the PR head; no contributor action is needed for this maintainer PR.

Evidence reviewed

PR surface:

Source +140, Tests +154. Total +294 across 2 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	1	173	33	+140
Tests	1	154	0	+154
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	2	327	33	+294

What I checked:

Current-main failure path: Current main projects dynamic tools by directly reading tool.parameters and tool.name, then emits specs by directly reading tool.name and tool.description, so a throwing descriptor getter can abort bridge construction before quarantine. (extensions/codex/src/app-server/dynamic-tools.ts:315, c8d21fe7f05b)
PR implementation: The PR head snapshots each entry and descriptor field under try/catch, records unreadable name/description/parameters violations, and only projects readable descriptors into Codex specs. (extensions/codex/src/app-server/dynamic-tools.ts:337, dce044ad049b)
Regression coverage: The new tests cover unreadable descriptor fields and unreadable array entries while asserting the healthy message/web_search tools remain exposed and the broken tools are not executed. (extensions/codex/src/app-server/dynamic-tools.test.ts:393, dce044ad049b)
Codex contract checked: Codex app-server v2 dynamic tool specs require name, description, inputSchema, and optional deferLoading, and thread startup validates identifiers, namespace rules, and input schema before creating core dynamic tools. (../codex/codex-rs/app-server-protocol/src/protocol/v2/thread.rs:42, b9af5d123442)
Runtime caller checked: runCodexAppServerAttempt builds the per-turn and registered dynamic tools, then passes both arrays into createCodexDynamicToolBridge before thread startup uses the resulting specs. (extensions/codex/src/app-server/run-attempt.ts:575, c8d21fe7f05b)
History provenance: Recent history shows the same area already owns dynamic-tool quarantine and reporting work, including unsupported schema quarantine and later pre-normalization quarantine. (extensions/codex/src/app-server/dynamic-tools.ts, d7d037b46f32)

Likely related people:

vincentkoc: Recent GitHub path history shows prior dynamic-tool quarantine/reporting commits and agent tool-schema quarantine work in the same bridge and schema-projection area. (role: recent area contributor; confidence: high; commits: d7d037b46f32, da5fe990d8b3, edc0a22179a7; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/dynamic-tools.test.ts, src/agents/tool-schema-projection.ts)
steipete: Recent path history shows app-server/runtime refactors and lint/cleanup work around the Codex dynamic-tool files and surrounding runtime boundary. (role: recent adjacent contributor; confidence: medium; commits: deb7bc653935, bb46b79d3c14, a4c2e7f5cf1b; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/dynamic-tool-build.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

vincentkoc · 2026-06-08T01:30:45Z

Closing this as the near-winner folded into the landed canonical PR: #90022

This PR had the stronger earlier proof signal, but #90022 was the current functional superset: it keeps the unreadable/invalid descriptor quarantine and also adds wrapping-time failure quarantine plus registered durable spec filtering so Codex is not advertised tools that cannot execute in the current turn.

Landed squash commit: 3ffb360

vincentkoc self-assigned this Jun 2, 2026

openclaw-barnacle Bot added extensions: codex size: M maintainer Maintainer-authored PR labels Jun 2, 2026

fix(codex): quarantine unreadable dynamic tools

dce044a

vincentkoc force-pushed the fuzz-tool-schema-surfaces-next25-20260602 branch from d6df61b to dce044a Compare June 2, 2026 14:15

clawsweeper Bot mentioned this pull request Jun 8, 2026

fix(codex): quarantine unreadable dynamic tool schemas #91289

Open

vincentkoc mentioned this pull request Jun 8, 2026

fix(codex): quarantine unreadable dynamic tools #90022

Merged

vincentkoc closed this Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(codex): quarantine unreadable dynamic tools#89481

fix(codex): quarantine unreadable dynamic tools#89481
vincentkoc wants to merge 1 commit into
mainfrom
fuzz-tool-schema-surfaces-next25-20260602

vincentkoc commented Jun 2, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

vincentkoc commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vincentkoc commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentkoc commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vincentkoc commented Jun 2, 2026 •

edited

Loading

clawsweeper Bot commented Jun 2, 2026 •

edited

Loading