Skip to content

fix(codex): quarantine unreadable dynamic tools#88774

Closed
vincentkoc wants to merge 1 commit into
cron-schema-main-repro-20260601from
fix-codex-dynamic-tool-unreadable-pruned-20260531
Closed

fix(codex): quarantine unreadable dynamic tools#88774
vincentkoc wants to merge 1 commit into
cron-schema-main-repro-20260601from
fix-codex-dynamic-tool-unreadable-pruned-20260531

Conversation

@vincentkoc

@vincentkoc vincentkoc commented May 31, 2026

Copy link
Copy Markdown
Member

Summary

  • quarantine unreadable Codex dynamic tool entries before registration so one malformed plugin tool does not poison the whole dynamic tool list
  • preserve healthy dynamic tools when a neighboring entry, name, description, or schema is unreadable
  • keep the provider-schema hardening in the stacked base PR; this PR only contains the Codex app-server quarantine surface

Stacked on: #88880

Verification

Stack base: 8aeaadf963a
Head: 946ea6a7347

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts --reporter=dot - passed 1 shard / 39 tests
  • node --import tsx scripts/generate-prompt-snapshots.ts --check - passed, Prompt snapshots are current (7 files).
  • node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts - passed
  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.extensions.json extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts - passed
  • git diff --check origin/cron-schema-main-repro-20260601...HEAD - passed
  • private reported-name scrub across spec and changed Codex files - clean
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/cron-schema-main-repro-20260601 --prompt 'Review the pruned Codex dynamic-tool quarantine branch stacked on the provider schema PR. Focus on unreadable or unsupported dynamic tool schemas, keeping healthy tools active, diagnostic clarity, test coverage, and avoiding duplicated provider projection work.' - clean, no accepted/actionable findings, overall: patch is correct (0.78)

Real behavior proof

Behavior addressed: Codex dynamic-tool registration now skips unreadable malformed dynamic tool entries and keeps healthy tools available instead of failing the whole tool registration path.

Real environment tested: Local OpenClaw Codex/gwt worktree on macOS with shared repo node_modules, using the repo Vitest wrapper plus snapshot, format, lint, diff, scrub, and autoreview gates.

Exact steps or command run after this patch: The focused Vitest, prompt snapshot, oxfmt, oxlint, diff-check, private-name scrub, and autoreview commands listed above were run after restacking onto #88880.

Evidence after fix: Codex dynamic-tools tests passed 1 shard / 39 tests; prompt snapshots were current; oxfmt, oxlint, and diff-check passed; autoreview reported no accepted/actionable findings.

Observed result after fix: Unreadable dynamic tool data is quarantined before Codex registration while valid dynamic tools remain registered and usable.

What was not tested: A live Codex provider turn and full pnpm check:changed were not run for this narrow stacked PR; GitHub CI is expected to validate the pushed branch.

@clawsweeper

clawsweeper Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge. Reviewed June 1, 2026, 1:45 AM ET / 05:45 UTC.

Summary
The PR adds Codex app-server quarantine handling and focused tests for unreadable dynamic tool entries, descriptions, and parameter schemas while preserving healthy sibling tools.

PR surface: Source +98, Tests +78. Total +176 across 2 files.

Reproducibility: yes. Source inspection shows a dynamic tool named lookup.ticket or a duplicate message would pass this PR's OpenClaw projection and then be rejected by Codex validate_dynamic_tools during thread/start.

Review metrics: 1 noteworthy metric.

  • Unmirrored Codex registration checks: 6 constraint categories still outside quarantine. Codex enforces identifier, length, reserved-prefix, reserved-namespace, duplicate-name, and deferred-namespace rules before startup, so maintainers should see that this PR still misses part of the registration contract.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🌊 off-meta tidepool
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Risk before merge

Maintainer options:

  1. Decide the mitigation before merge
    Keep the quarantine design, but extend this projection to mirror Codex's dynamic-tool identifier, namespace, defer-loading, and duplicate-name checks before registration, then land after the stacked base is resolved.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] A narrow automated repair can extend the changed projection/quarantine function and tests to cover Codex's documented registration checks.

Security
Cleared: The diff touches Codex runtime projection and tests only; I found no concrete security or supply-chain regression.

Review findings

  • [P1] Validate Codex dynamic tool names before registration — extensions/codex/src/app-server/dynamic-tools.ts:349
Review details

Best possible solution:

Keep the quarantine design, but extend this projection to mirror Codex's dynamic-tool identifier, namespace, defer-loading, and duplicate-name checks before registration, then land after the stacked base is resolved.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows a dynamic tool named lookup.ticket or a duplicate message would pass this PR's OpenClaw projection and then be rejected by Codex validate_dynamic_tools during thread/start.

Is this the best way to solve the issue?

No. The quarantine layer is the right owner boundary, but the best fix must quarantine Codex-invalid names/namespaces and duplicate emitted specs before registration rather than only unreadable fields and schemas.

Full review comments:

  • [P1] Validate Codex dynamic tool names before registration — extensions/codex/src/app-server/dynamic-tools.ts:349
    This projection now treats any readable name as registrable and only quarantines unreadable fields or schema errors. Codex rejects non-identifier names, reserved prefixes/namespaces, deferred tools without a namespace, and duplicate names during thread/start, so a plugin tool named lookup.ticket or a duplicate message can still reject the whole Codex startup before healthy tools are usable.
    Confidence: 0.91

Overall correctness: patch is incorrect
Overall confidence: 0.89

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 630f0d6938cd.

Label changes

Label justifications:

  • P2: This is a focused Codex dynamic-tool hardening PR with limited blast radius, but the remaining registration gap can still block affected Codex turns.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🌊 off-meta tidepool and patch quality is 🧂 unranked krab.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The external-contributor proof gate does not apply to this maintainer PR; the body lists focused tests but no live Codex turn.
Evidence reviewed

PR surface:

Source +98, Tests +78. Total +176 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 123 25 +98
Tests 1 78 0 +78
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 201 25 +176

Acceptance criteria:

  • [P1] node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts --reporter=dot.
  • [P1] node --import tsx scripts/generate-prompt-snapshots.ts --check.
  • [P1] node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts.
  • [P1] node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.extensions.json extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts.
  • [P1] git diff --check origin/cron-schema-main-repro-20260601...HEAD.

What I checked:

  • Repository policy applied: Root AGENTS.md and extensions/AGENTS.md were read fully; the Codex dependency-inspection gate and extension boundary guidance applied to this Codex app-server review. (AGENTS.md:15, 630f0d6938cd)
  • Changed projection still accepts any readable name: The PR head reads tool.name into diagnosticToolName and then projects only the input schema before emitting the spec; no Codex identifier, namespace, defer-loading, or duplicate-name check runs before registration. (extensions/codex/src/app-server/dynamic-tools.ts:349, 946ea6a7347d)
  • Codex validates registration before accepting thread/start: Upstream Codex validate_dynamic_tools rejects empty or whitespace-padded names, invalid identifiers, reserved mcp prefixes, bad/reserved namespaces, duplicate names, deferred tools without namespaces, and unsupported schemas before thread/start continues. (../codex/codex-rs/app-server/src/request_processors/thread_processor.rs:208, cf0911076f23)
  • OpenClaw sends these specs into Codex startup: runCodexAppServerAttempt passes toolBridge.specs into startCodexAttemptThread, which carries dynamicTools into buildThreadStartParams and then client.request("thread/start", startParams). (extensions/codex/src/app-server/thread-lifecycle.ts:559, 630f0d6938cd)
  • Added tests cover unreadable/schema cases only: The new tests prove unreadable entries, unreadable parameters, and unreadable descriptions are quarantined, but they do not cover invalid names, reserved namespaces, defer-loading namespace requirements, or duplicate emitted specs. (extensions/codex/src/app-server/dynamic-tools.test.ts:374, 946ea6a7347d)
  • Current PR state and stack: Live PR metadata shows this PR is open, authored by a MEMBER, labeled maintainer/P2, based on open stacked branch cron-schema-main-repro-20260601, and has head 946ea6a. (946ea6a7347d)

Likely related people:

  • vincentkoc: Current-main blame/log ties the checked-out Codex dynamic-tool bridge, run-attempt path, and schema projection files to 76fa1b9, and this PR plus its stacked base continue that hardening lane. (role: recent area contributor; confidence: high; commits: 76fa1b99c310, 8aeaadf963a3, 946ea6a7347d; files: extensions/codex/src/app-server/dynamic-tools.ts, extensions/codex/src/app-server/run-attempt.ts, src/agents/tool-schema-projection.ts)
  • steipete: Git history shows repeated Codex app-server lifecycle and control commits in the same extension area, making Peter a likely reviewer for the runtime boundary even though this specific branch is Vincent's lane. (role: adjacent Codex app-server contributor; confidence: medium; commits: 31a0b7bd42a5, 8d72aafdbb8d, 659bcc5e5b59; files: extensions/codex/src/app-server)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. labels May 31, 2026
@vincentkoc vincentkoc marked this pull request as ready for review May 31, 2026 20:37
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 31, 2026
@vincentkoc vincentkoc force-pushed the fix-codex-dynamic-tool-unreadable-pruned-20260531 branch from 51a1c68 to ffa6298 Compare June 1, 2026 01:44
@vincentkoc vincentkoc requested review from a team as code owners June 1, 2026 01:44
@vincentkoc vincentkoc changed the base branch from test-neutral-tool-schema-fixtures-refresh-20260531 to main June 1, 2026 01:44
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 1, 2026
@vincentkoc vincentkoc force-pushed the fix-codex-dynamic-tool-unreadable-pruned-20260531 branch from ffa6298 to 38e523b Compare June 1, 2026 02:13
@clawsweeper clawsweeper Bot added merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels Jun 1, 2026
@vincentkoc vincentkoc force-pushed the fix-codex-dynamic-tool-unreadable-pruned-20260531 branch from 38e523b to 9c95e00 Compare June 1, 2026 03:16
@clawsweeper clawsweeper Bot removed the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label Jun 1, 2026
@vincentkoc vincentkoc force-pushed the fix-codex-dynamic-tool-unreadable-pruned-20260531 branch from 9c95e00 to dfe817d Compare June 1, 2026 03:58
@vincentkoc vincentkoc changed the base branch from main to cron-schema-main-repro-20260601 June 1, 2026 03:58
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 1, 2026
@vincentkoc vincentkoc force-pushed the fix-codex-dynamic-tool-unreadable-pruned-20260531 branch from dfe817d to f14ae04 Compare June 1, 2026 04:13
@vincentkoc vincentkoc force-pushed the cron-schema-main-repro-20260601 branch from c6aaf80 to a23ee0b Compare June 1, 2026 04:28
@vincentkoc vincentkoc force-pushed the fix-codex-dynamic-tool-unreadable-pruned-20260531 branch from f14ae04 to fb95836 Compare June 1, 2026 04:33
@vincentkoc vincentkoc force-pushed the cron-schema-main-repro-20260601 branch 2 times, most recently from 4caf7f9 to 8aeaadf Compare June 1, 2026 05:30
@vincentkoc vincentkoc force-pushed the fix-codex-dynamic-tool-unreadable-pruned-20260531 branch from fb95836 to 946ea6a Compare June 1, 2026 05:37
@vincentkoc vincentkoc self-assigned this Jun 1, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

Closing this as folded into the landed canonical PR: #90022

Reason: this branch was stacked on a non-main base and did not cover the full bridge boundary that landed in #90022. The merged PR keeps unreadable descriptor quarantine and adds wrapping-time failure quarantine plus registered durable spec filtering.

Landed squash commit: 3ffb360

@vincentkoc vincentkoc closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex maintainer Maintainer-authored PR P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant