fix(codex): complete dynamic tool diagnostics#83476
Conversation
8e57ea6 to
d5cbb4c
Compare
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: no. live high-confidence reproduction was established in this review. The current-main source path lacks a terminal dynamic-tool diagnostic at the request boundary, and the linked report supplies redacted production logs for the stuck PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Where did the egg go?
Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Merge the boundary-owned diagnostic cleanup after the remaining latest-head checks pass and maintainers either obtain redacted live Codex dynamic-tool proof or explicitly accept the focused diagnostic proof gap. Do we have a high-confidence way to reproduce the issue? No live high-confidence reproduction was established in this review. The current-main source path lacks a terminal dynamic-tool diagnostic at the request boundary, and the linked report supplies redacted production logs for the stuck Is this the best way to solve the issue? Yes, with a proof caveat. Moving terminal diagnostic ownership to the Codex app-server request boundary is a narrow maintainable fix, but the merge decision still needs live proof or explicit maintainer acceptance of the proof gap. Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against f0b43bfd34c4. |
d5cbb4c to
78d4183
Compare
|
Updated the PR body with redacted after-fix diagnostic activity proof and a focused run-attempt Vitest result under Node 24.15.0. @clawsweeper re-review |
|
Updated the PR body to follow the full proof template and keep all sensitive values redacted. The added section includes the pre-fix stalled Important remaining gap is called out explicitly: the patched live Codex transport still did not reach a successful |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d650cb791
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
P2 follow-up and proof update for PR #83476. I pushed commit What changed:
Proof run: Result: The new P2 fixture is The retained hardening fixture, Live gateway note: I also attempted a fresh live proof on this VM on 2026-05-18 using the patched checkout and the refreshed Codex auth profile. Both source-runtime paths stalled before reaching the actual Codex tool turn:
Those live attempts did not reach model/tool execution, so I am not presenting them as behavioral proof. I stopped them to avoid locking the VM and restored the local OpenClaw config. No secrets or auth tokens are included in this proof. |
|
Current state after
Current CI status still needs maintainer/author triage. The media/image-ops failures appear unrelated to this Codex app-server change because this PR only touches Codex app-server files. I am checking the Codex prompt snapshot drift separately; if the drift is caused by this PR, I will commit the regenerated snapshots. If not, I will leave it for maintainer judgment or rerun. The live Codex gateway proof gap remains honestly documented: patched source-runtime attempts still do not reach dynamic tool execution in my environment, so the PR relies on focused request-boundary coverage plus diagnostic activity proof unless a maintainer can reproduce the live transport path. @codex review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a4816064d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Current state after
Local validation on the latest branch state: @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Follow-up after ClawSweeper P1 on Addressed the timeout/abort fallback diagnostic gap for wrapped dynamic tools that do not respond to
Validation: No additional live gateway proof is being added from this VM. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Latest ClawSweeper re-review on Current state:
Please treat the remaining proof item as a maintainer acceptance decision for the existing redacted production logs, diagnostic activity proof, focused regression tests, and green CI. |
add8073 to
056051a
Compare
|
Rewrote this into a boundary-owned Codex app-server diagnostic fix. Behavior addressed: OpenClaw dynamic tool calls served through Codex |
056051a to
76d1a7e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76d1a7eb9f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| deniedReason: "plugin-before-tool-call", | ||
| reason: "Tool call blocked", |
There was a problem hiding this comment.
Preserve blocked denial metadata in terminal diagnostics
When a dynamic tool is blocked, this emitter always reports deniedReason: "plugin-before-tool-call" and reason: "Tool call blocked", but blocked results can carry other denial causes (for example plugin-approval or tool-loop from buildBlockedToolResult in src/agents/pi-tools.before-tool-call.ts). Because this commit disables hook-owned tool diagnostics for Codex dynamic tools, this hardcoded branch now becomes the only blocked terminal event and misclassifies real block reasons in downstream diagnostics/metrics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76d1a7eb9f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| deniedReason: "plugin-before-tool-call", | ||
| reason: "Tool call blocked", |
There was a problem hiding this comment.
Preserve blocked denial metadata in terminal diagnostics
When a dynamic tool is blocked, the bridge only preserves a diagnosticTerminalType flag and this emitter hard-codes deniedReason/reason to plugin-before-tool-call and Tool call blocked. That drops the real block metadata (for example approval or loop-denial reasons) produced by the before-tool-call path, so downstream diagnostics and alerting cannot distinguish why the tool was denied and will misclassify non-policy blocks as policy vetoes.
Useful? React with 👍 / 👎.
Co-authored-by: 0x505badc0de <32790662+rozmiarD@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
76d1a7e to
82df3ac
Compare
Fixes #83474.
Summary
This fixes a Codex app-server dynamic tool bookkeeping gap where
item/tool/callrequests can return a dynamic tool response, but OpenClaw does not emit a terminal tool diagnostic or clear the tracked dynamic tool call id at that request boundary.The observed production symptom was a successful no-output
bashdynamic tool result shown in the UI while the gateway kept reporting:The change makes the dynamic tool request path:
tool.execution.startedwhen the dynamic tool request begins;tool.execution.completed/tool.execution.errorwhen the dynamic tool response returns or throws;finally;run-attempt.test.ts.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real Behavior Proof
bashtool completion was visible to the user, but gateway diagnostics kept the session inblocked_tool_callwithactiveTool=bashand a redacted active tool call id.@openclaw/codex2026.5.12 for the pre-fix failure; local patched checkoutOpenClaw 2026.5.17 (78d4183)for after-fix diagnostic activity and patched-gateway transport attempts on May 18, 2026.The patched boundary now emits the terminal event that clears gateway activity state. I validated the redacted activity transition directly against the same
diagnostic-run-activitytracker used by gateway diagnostics:{ "afterStart": { "activeWorkKind": "tool_call", "activeToolName": "bash", "activeToolCallId": "tool-call-redacted", "activeToolAgeMs": 0, "lastProgressAgeMs": 0, "lastProgressReason": "tool:bash:started" }, "afterCompleted": { "lastProgressAgeMs": 0, "lastProgressReason": "tool:bash:ended" } }After the terminal diagnostic, the gateway activity snapshot no longer contains
activeWorkKind=tool_call,activeToolName=bash, oractiveToolCallId.Patched local gateway transport attempt, redacted:
tool.execution.completed/tool.execution.errorclears the activebashtool state that produced the stalled diagnostics. The local patched gateway starts from commit78d4183, but the live Codex transport did not reach tool execution in the patched runtime attempt.bashtool request completing end-to-end throughgateway call agent. The patched live attempts either disconnected before tool execution or timed out before a terminal tool event, so this PR still relies on the focused request-boundary test plus the gateway activity transition proof unless a maintainer can reproduce the live transport path.No secrets, auth tokens, profile hashes, local usernames, session ids, full local paths, ports, or temp config paths are included in the proof above.
Root Cause
item/tool/callbranch tracked dynamic tool request state, but did not emit a terminal diagnostic event or clear the tracked dynamic tool call id at that request boundary.Regression Test Plan
extensions/codex/src/app-server/run-attempt.test.tsitem/tool/callhandling emits start and terminal diagnostic events and clears tracked dynamic tool ids.User-visible / Behavior Changes
Codex-backed OpenClaw sessions should stop reporting a completed dynamic tool as still active after the app-server returns the dynamic tool response.
Diagram
Security Impact (required)
Yes/No) NoYes/No) NoYes/No) NoYes/No) NoYes/No) NoYes, explain risk + mitigation: N/ARepro + Verification
Environment
openai-codex/codex/gpt-5.4andcodex/gpt-5.5attemptsmode: yolo, temp state/config paths redactedSteps
tool.execution.completed.bashrun; record timeout/disconnect outcome without exposing secrets.Expected
activeTool=bashand its tool call id.Actual
bashtool state after the terminal event.Evidence
Attach at least one:
Human Verification (required)
What I personally verified (not just CI), and how:
bashdiagnostics; patched diagnostic activity transition; focused run-attempt regression test.bashtool completion throughgateway call agent, because live transport timed out or disconnected before tool execution.Review Conversations
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoRisks and Mitigations
Validation
Passed:
Focused Vitest result under Node 24:
Note: the host's system Node 22.22.0 cannot load the repo Vitest config because
path.matchesGlobthrowsMissing internal module 'internal/deps/brace-expansion'. Running the same focused test with temporary Node 24.15.0 avoids that local runtime issue.AI Assistance
This PR was AI-assisted. I reviewed the issue evidence, searched for related upstream reports before filing, made the code changes locally, redacted the proof, and ran the validation commands listed above.