Skip to content

fix(codex): quarantine unreadable dynamic tools#89481

Closed
vincentkoc wants to merge 1 commit into
mainfrom
fuzz-tool-schema-surfaces-next25-20260602
Closed

fix(codex): quarantine unreadable dynamic tools#89481
vincentkoc wants to merge 1 commit into
mainfrom
fuzz-tool-schema-surfaces-next25-20260602

Conversation

@vincentkoc

@vincentkoc vincentkoc commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

  • Snapshot Codex dynamic tool names, descriptions, and schemas before wrapping or emitting specs.
  • Quarantine unreadable dynamic tool entries/descriptors instead of aborting Codex bridge construction.
  • Keep healthy sibling dynamic tools available when one plugin tool has a hostile descriptor or list entry.

Verification

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot
  • ./node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
  • ./node_modules/.bin/oxlint extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
  • git diff --check
  • .agents/skills/autoreview/scripts/autoreview --mode local --no-web-search
  • AWS Crabbox fresh-PR changed gate: run_8adb0867537d, lease cbx_5a28a16938b0, provider aws, command corepack pnpm check:changed, exit 0

Real behavior proof

Behavior addressed: Codex app-server dynamic tool bridge no longer crashes before content when a dynamic tool list entry or descriptor field is unreadable.
Real environment tested: local macOS checkout with Node 24 via repo Vitest wrapper; AWS Crabbox fresh PR checkout on Linux Node 24.15.0.
Exact steps or command run after this patch: node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts -- --reporter=dot; AWS Crabbox corepack pnpm check:changed on PR #89481.
Evidence after fix: focused Codex dynamic-tools suite passed 38 tests; AWS Crabbox run run_8adb0867537d passed changed lanes extensions, extensionTests.
Observed result after fix: malformed dynamic tools are quarantined, warnings/diagnostics name the offending tool or index, healthy sibling specs remain available, and healthy tool calls still execute.
What was not tested: full repository test suite.

@vincentkoc vincentkoc self-assigned this Jun 2, 2026
@clawsweeper

clawsweeper Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 2, 2026, 10:27 AM ET / 14:27 UTC.

Summary
This PR changes the Codex app-server dynamic tool bridge to snapshot tool descriptors, quarantine unreadable entries or fields, and add regression tests that keep healthy sibling tools available.

PR surface: Source +140, Tests +154. Total +294 across 2 files.

Reproducibility: yes. Current main directly reads dynamic tool descriptor fields during projection/spec creation, and the PR adds focused tests with throwing getters that exercise that source path.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Next step before merge

  • No ClawSweeper repair lane is needed because this read-only review found no narrow actionable defect; remaining handling is normal maintainer draft and gate completion.

Security
Cleared: The diff only changes Codex dynamic-tool runtime handling and colocated tests, with no dependency, workflow, secret, or package-resolution surface added.

Review details

Best possible solution:

Keep the bridge-level descriptor quarantine and land it after normal maintainer draft/CI gates if maintainers agree with strict quarantine for unreadable descriptors.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main directly reads dynamic tool descriptor fields during projection/spec creation, and the PR adds focused tests with throwing getters that exercise that source path.

Is this the best way to solve the issue?

Yes. The bridge is the narrow owner boundary because it is where OpenClaw turns runtime tools into Codex specs and wraps executable tools; build-time schema filtering already covers some parameter cases but not unreadable descriptions or final spec emission.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c8d21fe7f05b.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body records after-fix focused runtime test output and an AWS Crabbox changed gate run for the PR head; no contributor action is needed for this maintainer PR.
  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority Codex runtime hardening fix with limited blast radius around dynamic tool bridge construction.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body records after-fix focused runtime test output and an AWS Crabbox changed gate run for the PR head; no contributor action is needed for this maintainer PR.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body records after-fix focused runtime test output and an AWS Crabbox changed gate run for the PR head; no contributor action is needed for this maintainer PR.
Evidence reviewed

PR surface:

Source +140, Tests +154. Total +294 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 173 33 +140
Tests 1 154 0 +154
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 327 33 +294

What I checked:

Likely related people:

  • vincentkoc: Recent GitHub path history shows prior dynamic-tool quarantine/reporting commits and agent tool-schema quarantine work in the same bridge and schema-projection area. (role: recent area contributor; confidence: high; commits: d7d037b46f32, da5fe990d8b3, edc0a22179a7; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/dynamic-tools.test.ts, src/agents/tool-schema-projection.ts)
  • steipete: Recent path history shows app-server/runtime refactors and lint/cleanup work around the Codex dynamic-tool files and surrounding runtime boundary. (role: recent adjacent contributor; confidence: medium; commits: deb7bc653935, bb46b79d3c14, a4c2e7f5cf1b; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/dynamic-tool-build.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@vincentkoc vincentkoc force-pushed the fuzz-tool-schema-surfaces-next25-20260602 branch from d6df61b to dce044a Compare June 2, 2026 14:15
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 2, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

Closing this as the near-winner folded into the landed canonical PR: #90022

This PR had the stronger earlier proof signal, but #90022 was the current functional superset: it keeps the unreadable/invalid descriptor quarantine and also adds wrapping-time failure quarantine plus registered durable spec filtering so Codex is not advertised tools that cannot execute in the current turn.

Landed squash commit: 3ffb360

@vincentkoc vincentkoc closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex maintainer Maintainer-authored PR P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant