Skip to content

fix(codex): quarantine unreadable dynamic tools#89063

Closed
vincentkoc wants to merge 1 commit into
mainfrom
fuzz-codex-dynamic-tool-names-20260601
Closed

fix(codex): quarantine unreadable dynamic tools#89063
vincentkoc wants to merge 1 commit into
mainfrom
fuzz-codex-dynamic-tool-names-20260601

Conversation

@vincentkoc

Copy link
Copy Markdown
Member

Summary

  • guard Codex dynamic-tool bridge startup against tools whose descriptor/schema accessors throw
  • quarantine unreadable dynamic tools with telemetry/diagnostics instead of aborting bridge creation
  • carry projected tool name/description/schema through spec creation and dispatch telemetry so valid tools do not re-read brittle accessors

Verification

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts --reporter=dot
  • node_modules/.bin/oxlint extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
  • git diff --check HEAD~1..HEAD
  • direct node --import tsx repro: bad dynamic tool parameters getter no longer aborts createCodexDynamicToolBridge; observed available tool list message and quarantine reason bad_dynamic_probe.inputSchema could not be read: parameters getter exploded
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Real behavior proof

Behavior addressed: Codex dynamic-tool startup can continue with healthy tools when a plugin/runtime tool exposes an unreadable schema accessor.

Real environment tested: local OpenClaw Codex extension test lane via repo wrapper; direct tsx bridge-construction repro.

Exact steps or command run after this patch: node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts --reporter=dot; direct node --import tsx repro invoking createCodexDynamicToolBridge; node scripts/crabbox-wrapper.mjs run --provider aws --idle-timeout 90m --ttl 240m --timing-json --shell -- "pnpm check:changed".

Evidence after fix: focused test passed, 1 file / 37 tests; direct repro printed message plus the expected quarantine reason; oxlint and diff check passed; autoreview reported no accepted/actionable findings.

Observed result after fix: the bridge exposes the healthy message tool and records the bad dynamic tool in telemetry.quarantinedTools instead of throwing during startup.

What was not tested: remote pnpm check:changed did not complete. Blacksmith Testbox is not authenticated in this shell. AWS Crabbox did not reach a remote lease because the wrapper needed a temporary full checkout for sync and failed locally with No space left on device.

@vincentkoc vincentkoc self-assigned this Jun 1, 2026
@clawsweeper

clawsweeper Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge. Reviewed June 1, 2026, 7:33 AM ET / 11:33 UTC.

Summary
The PR changes the Codex dynamic-tool bridge to quarantine tools whose name, description, or parameters accessors cannot be read, while preserving projected fields for specs and telemetry.

PR surface: Source +83, Tests +29. Total +112 across 2 files.

Reproducibility: yes. source-reproducible: current main reads tool.parameters and tool.name directly while constructing the bridge, so a throwing accessor can abort startup before any quarantine path runs. I did not run the repro because this review was kept read-only.

Review metrics: none identified.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🌊 off-meta tidepool
Patch quality: 🦐 gold shrimp
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Fix the descriptor re-read path around before-tool-call wrapping and add a regression test for a throwing description or one-shot parameters getter.
  • Complete the broader changed check or record an accepted maintainer reason for landing with focused proof only.

Risk before merge

  • [P1] The patch still lets bridge construction fail when a tool descriptor getter throws during before-tool-call wrapping, so the unreadable dynamic-tool class is not fully quarantined.
  • [P1] The PR body reports focused tests and a direct tsx repro, but the broader remote changed check did not complete before review.

Maintainer options:

  1. Decide the mitigation before merge
    Keep the quarantine design, but make the bridge wrap or execute tools through a stable projected descriptor so name, description, and parameters are not re-read after validation, then add a regression test for a throwing description or one-shot parameters getter.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] The remaining blocker is a narrow code/test repair in the Codex dynamic-tool bridge and before-tool-call wrapping path.

Security
Cleared: The diff changes Codex dynamic-tool bridge error handling and tests, with no dependency, workflow, secret, permission, or supply-chain surface added.

Review findings

  • [P2] Avoid re-reading projected tool descriptors during wrapping — extensions/codex/src/app-server/dynamic-tools.ts:115
Review details

Best possible solution:

Keep the quarantine design, but make the bridge wrap or execute tools through a stable projected descriptor so name, description, and parameters are not re-read after validation, then add a regression test for a throwing description or one-shot parameters getter.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main reads tool.parameters and tool.name directly while constructing the bridge, so a throwing accessor can abort startup before any quarantine path runs. I did not run the repro because this review was kept read-only.

Is this the best way to solve the issue?

No, not yet: quarantine-before-registration is the right layer, but the proposed implementation still re-reads descriptor accessors through wrapToolWithBeforeToolCallHook. The safer fix is to pass a stable projected tool shape into wrapping/spec creation or teach the wrapper to use already-read descriptor fields.

Full review comments:

  • [P2] Avoid re-reading projected tool descriptors during wrapping — extensions/codex/src/app-server/dynamic-tools.ts:115
    readCodexDynamicToolFields now catches a throwing description getter and keeps the tool with an empty description, but this line still passes the original object to wrapToolWithBeforeToolCallHook. That helper reads tool.name and spreads ...tool, so a descriptor getter that throws during wrapping still aborts createCodexDynamicToolBridge instead of quarantining the bad descriptor. Build the wrapped entry from the already-projected name/description/schema, or make the wrapper accept those projected fields, and cover a throwing description or one-shot parameters getter.
    Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.87

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a595aba60e3b.

Label changes

Label changes:

  • add P2: This is a normal Codex dynamic-tool availability fix with limited blast radius, and the remaining blocker is a focused bridge correctness issue.
  • add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The external-contributor proof gate is not applicable to this member-authored maintainer PR; the body still includes focused test and direct tsx repro output as useful validation context.

Label justifications:

  • P2: This is a normal Codex dynamic-tool availability fix with limited blast radius, and the remaining blocker is a focused bridge correctness issue.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The external-contributor proof gate is not applicable to this member-authored maintainer PR; the body still includes focused test and direct tsx repro output as useful validation context.
Evidence reviewed

PR surface:

Source +83, Tests +29. Total +112 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 110 27 +83
Tests 1 29 0 +29
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 139 27 +112

Acceptance criteria:

  • [P1] node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts --reporter=dot.
  • [P1] node_modules/.bin/oxlint extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts.
  • [P1] git diff --check HEAD~1..HEAD.

What I checked:

Likely related people:

  • vincentkoc: Current main blame for createCodexDynamicToolBridge, projectCodexDynamicTools, and wrapToolWithBeforeToolCallHook points to commit 2ea7c51, and this PR is also from the same area contributor. (role: recent area contributor; confidence: high; commits: 2ea7c518a516, 88e0985fcf0d; files: extensions/codex/src/app-server/dynamic-tools.ts, src/agents/agent-tools.before-tool-call.ts)
  • steipete: Recent history on the Codex app-server path includes the app-server relocation and release-adjacent Codex changes, making this a useful routing fallback if the current branch needs broader Codex ownership review. (role: adjacent recent contributor; confidence: medium; commits: d5698038d71c, e93216080aa1; files: extensions/codex/src/app-server/dynamic-tools.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 1, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

Closing this as folded into the landed canonical PR: #90022

Reason: this draft was an earlier route for the same Codex dynamic-tool hardening, but the review blocker was that wrapped execution could still re-read unsafe descriptors. The merged PR fixes the bridge boundary by caching readable descriptor data, quarantining wrapping-time failures, and filtering specs for unavailable tools.

Landed squash commit: 3ffb360

@vincentkoc vincentkoc closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex maintainer Maintainer-authored PR rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: S status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant