Skip to content

fix(codex): expose bound conversation dynamic tools#85213

Open
wizdomhall-hash wants to merge 1 commit into
openclaw:mainfrom
wizdomhall-hash:pr/macos-codex-plugin-tool-bridge
Open

fix(codex): expose bound conversation dynamic tools#85213
wizdomhall-hash wants to merge 1 commit into
openclaw:mainfrom
wizdomhall-hash:pr/macos-codex-plugin-tool-bridge

Conversation

@wizdomhall-hash

Copy link
Copy Markdown

Summary

  • expose OpenClaw dynamic plugin tools to native bound Codex conversation threads
  • refresh managed Codex bindings when the dynamic-tool surface changes
  • route Codex app-server dynamic tool calls back through OpenClaw plugin handlers with bounded timeouts
  • keep bundled Codex host manifests from being shadowed by stale official global plugin installs

Motivation

On macOS, the Kynver AgentOS OpenClaw plugin could be loaded and runtime inspection could see its agent_os_* contracts, but native Codex conversation threads did not receive those dynamic tool specs as callable tools. That meant agent_os_get_context and related tools were visible to OpenClaw inspection but missing from the Codex-visible tool surface.

This PR recovers the working local Track 2 patch from a macOS OpenClaw/Codex install where the AgentOS tools became callable after the Codex bridge change.

Before / After

Before:

  • plugin runtime inspection could see imported Kynver AgentOS tool contracts
  • native bound Codex threads did not expose the imported dynamic tools to Codex tool calls

After:

  • managed Codex conversation bindings include the current OpenClaw dynamic-tool surface
  • tool-surface changes trigger binding refresh instead of leaving stale Codex-visible tools
  • Codex dynamic tool calls dispatch through OpenClaw plugin handlers rather than failing as missing tools

Verification

Focused tests run in the prepared recovery checkout:

node scripts/test-projects.mjs extensions/codex/src/conversation-binding.test.ts src/plugins/manifest-registry.test.ts src/plugins/tools.optional.test.ts

Result:

  • plugin shard: 125 passed
  • Codex extension shard: 19 passed

Additional checks:

git diff --cached --check -- .

The same whitespace check passed again after replaying the patch onto a fresh openclaw/openclaw clone. The fresh clone did not have dependencies installed, so the focused test command there stopped at missing vitest/package.json before tests started.

MacOS smoke from the patched local setup:

  • fresh OpenClaw/Codex tool discovery exposed agent_os_get_context
  • calling it reached the tool bridge successfully instead of failing as a missing tool

Scope

The patch is limited to the Codex conversation binding / dynamic tool bridge and supporting plugin registry behavior needed for that path. No secrets or local machine paths are included.

@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds Codex app-server conversation-binding dynamic tool startup, refresh, dispatch, timeout handling, binding-origin metadata, and related plugin registry/tool-selection tests.

Reproducibility: yes. source inspection gives a high-confidence reproduction path: current main handles app-server conversation item/tool/call requests by returning a fixed “does not expose dynamic OpenClaw tools yet” failure. I did not run a live macOS Codex/AgentOS setup in this read-only review.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Summary: The PR has a useful target, but missing real behavior proof plus a blocking compatibility regression make it not quality-ready yet.

Rank-up moves:

  • Add redacted real behavior proof from a macOS OpenClaw/Codex setup showing the dynamic tool appears and a call reaches the bridge.
  • Fix the shared plugin allowlist regression and keep focused tests for default plugin tools plus optional alsoAllow entries.
  • Add or document upgrade proof for managed bound threads when dynamic-tool fingerprints change.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The PR body has test output and a narrative macOS smoke note, but needs a redacted terminal/log/screenshot/video artifact showing the AgentOS tool visible and callable after the patch; updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • Existing users with tools.alsoAllow can lose unrelated default non-optional plugin tools because the PR changes the default-plugin sentinel behavior in shared plugin discovery.
  • The managed binding refresh path starts a new Codex thread when dynamic-tool fingerprints change, so upgrade proof should show the history/context reset is acceptable.
  • The PR body does not include inspectable real behavior proof for the claimed macOS AgentOS/Codex bridge recovery.

Maintainer options:

  1. Preserve alsoAllow compatibility (recommended)
    Remove or narrow the new early return so DEFAULT_PLUGIN_TOOLS_ALLOWLIST_ENTRY still keeps default non-optional plugin tools while Codex-specific dynamic tool selection remains bounded.
  2. Prove managed binding refresh upgrades
    Before merge, show a redacted fresh-install and upgrade path where a managed bound thread gets dynamic tools without an unexpected loss of important conversation context.
  3. Pause the shared registry change
    If the default-tool narrowing is only needed for this Codex path, leave shared plugin discovery unchanged and move the narrowing to a Codex-local allowlist/profile decision.

Next step before merge
The PR needs contributor real behavior proof and a code fix for the allowlist regression before merge; automation should not take over while the proof gate is unmet.

Security
Cleared: The diff touches dynamic plugin tool dispatch, but the reviewed path scopes calls to the active thread/turn and does not add new dependencies, CI execution, secrets handling, or broader permissions.

Review findings

  • [P1] Preserve default plugin tools for alsoAllow — src/plugins/tools.ts:374-375
Review details

Best possible solution:

Keep the Codex conversation dynamic-tool bridge, but preserve the existing plugin tool allowlist contract and require redacted after-fix proof from a real AgentOS/Codex setup before merge.

Do we have a high-confidence way to reproduce the issue?

Yes, source inspection gives a high-confidence reproduction path: current main handles app-server conversation item/tool/call requests by returning a fixed “does not expose dynamic OpenClaw tools yet” failure. I did not run a live macOS Codex/AgentOS setup in this read-only review.

Is this the best way to solve the issue?

No, not as written: the bridge implementation is plausible, but the shared plugin discovery change breaks the existing alsoAllow compatibility contract. The safer solution is to preserve default plugin tools globally and apply any Codex-specific narrowing locally.

Label changes:

  • add P1: The PR targets a broken Codex/plugin dynamic-tool workflow and also contains a merge-blocking compatibility regression in shared plugin tool discovery.
  • add merge-risk: 🚨 compatibility: The diff can remove default plugin tools from existing tools.alsoAllow setups by changing shared allowlist semantics.
  • add merge-risk: 🚨 session-state: The dynamic-tool refresh path can replace managed Codex conversation threads, which may affect persisted thread context during upgrade.
  • add rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🧂 unranked krab, and The PR has a useful target, but missing real behavior proof plus a blocking compatibility regression make it not quality-ready yet.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body has test output and a narrative macOS smoke note, but needs a redacted terminal/log/screenshot/video artifact showing the AgentOS tool visible and callable after the patch; updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Label justifications:

  • P1: The PR targets a broken Codex/plugin dynamic-tool workflow and also contains a merge-blocking compatibility regression in shared plugin tool discovery.
  • merge-risk: 🚨 compatibility: The diff can remove default plugin tools from existing tools.alsoAllow setups by changing shared allowlist semantics.
  • merge-risk: 🚨 session-state: The dynamic-tool refresh path can replace managed Codex conversation threads, which may affect persisted thread context during upgrade.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🧂 unranked krab, and The PR has a useful target, but missing real behavior proof plus a blocking compatibility regression make it not quality-ready yet.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body has test output and a narrative macOS smoke note, but needs a redacted terminal/log/screenshot/video artifact showing the AgentOS tool visible and callable after the patch; updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Full review comments:

  • [P1] Preserve default plugin tools for alsoAllow — src/plugins/tools.ts:374-375
    This early return changes alsoAllow-only policies from “default plugin tools plus named optional tools” to “only plugins matching explicit entries”. Current main has coverage expecting default non-optional plugin tools to remain available when the default-plugin sentinel is present, so existing configs with tools.alsoAllow can silently lose unrelated default plugin tools after upgrade. Keep the sentinel behavior here and narrow the Codex dynamic-tool surface through a Codex-local allowlist if needed.
    Confidence: 0.89

Overall correctness: patch is incorrect
Overall confidence: 0.86

What I checked:

  • Current main has the reported Codex conversation gap: The current app-server conversation binding returns a fixed failure for item/tool/call, so the central bug is source-reproducible on main. (extensions/codex/src/conversation-binding.ts:412, 229490a48924)
  • Current main preserves default plugin tools for alsoAllow: listManifestToolNamesForAllowlist treats the default-plugin sentinel as preserving non-optional plugin tools and returns those defaults plus explicit matches. (src/plugins/tools.ts:123, 229490a48924)
  • Regression test on main covers the compatibility contract: The existing test expects a non-optional default plugin tool to remain available when alsoAllow opts into an optional tool. (src/plugins/tools.optional.test.ts:994, 229490a48924)
  • PR diff drops unrelated default plugin tools: The PR adds an early return when the allowlist has multiple entries and no tool-name match, which suppresses default non-optional plugin tools for alsoAllow-only configurations. (src/plugins/tools.ts:374, 1e6cbfe0f0e1)
  • History provenance for the affected compatibility code: Blame ties the current plugin allowlist helper and conversation-binding gap in this checkout to commit 98af517. (src/plugins/tools.ts:351, 98af51748d19)
  • Real behavior proof is not inspectable: The PR body includes focused test results and a narrative macOS smoke note, but no terminal output, logs, screenshot, recording, or linked artifact showing agent_os_get_context visible and callable after the patch. (1e6cbfe0f0e1)

Likely related people:

  • BSG2000: Current-main blame attributes the plugin allowlist helper and conversation-binding code in this shallow history to commit 98af517. (role: recent area contributor; confidence: medium; commits: 98af51748d19; files: src/plugins/tools.ts, extensions/codex/src/conversation-binding.ts, src/plugins/manifest-registry.ts)
  • joshavant: Recent local history shows adjacent Codex app-server work on sandbox execution and missing completion handling, which are close to the conversation binding/runtime surface. (role: recent Codex app-server contributor; confidence: medium; commits: ba06376c7955, 7cda26aa6c72; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/conversation-binding.ts)
  • vincentkoc: Recent history includes adjacent Codex app-server dynamic tool configuration work, relevant to the timeout and tool-surface behavior this PR changes. (role: recent Codex dynamic-tool contributor; confidence: medium; commits: 60d200f79719; files: extensions/codex/src/app-server/run-attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 229490a48924.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@BingqingLyu

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: XL status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants