Skip to content

fix(codex): complete dynamic tool diagnostics#83476

Merged
steipete merged 1 commit into
openclaw:mainfrom
rozmiarD:codex-dynamic-tool-release-fix
May 18, 2026
Merged

fix(codex): complete dynamic tool diagnostics#83476
steipete merged 1 commit into
openclaw:mainfrom
rozmiarD:codex-dynamic-tool-release-fix

Conversation

@rozmiarD

@rozmiarD rozmiarD commented May 18, 2026

Copy link
Copy Markdown
Contributor

Fixes #83474.

Summary

This fixes a Codex app-server dynamic tool bookkeeping gap where item/tool/call requests can return a dynamic tool response, but OpenClaw does not emit a terminal tool diagnostic or clear the tracked dynamic tool call id at that request boundary.

The observed production symptom was a successful no-output bash dynamic tool result shown in the UI while the gateway kept reporting:

reason=blocked_tool_call classification=blocked_tool_call activeTool=bash activeToolCallId=... recovery=none

The change makes the dynamic tool request path:

  • emit tool.execution.started when the dynamic tool request begins;
  • emit terminal tool.execution.completed / tool.execution.error when the dynamic tool response returns or throws;
  • delete the tracked dynamic tool call id in finally;
  • cover the diagnostic behavior in run-attempt.test.ts.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real Behavior Proof

  • Behavior or issue addressed: Codex app-server dynamic bash tool completion was visible to the user, but gateway diagnostics kept the session in blocked_tool_call with activeTool=bash and a redacted active tool call id.
  • Real environment tested: Ubuntu/Linux desktop, installed OpenClaw / @openclaw/codex 2026.5.12 for the pre-fix failure; local patched checkout OpenClaw 2026.5.17 (78d4183) for after-fix diagnostic activity and patched-gateway transport attempts on May 18, 2026.
  • Exact steps or command run after this patch:
node --check --experimental-strip-types extensions/codex/src/app-server/run-attempt.ts
pnpm tsgo:test:extensions
pnpm dlx node@24 ./node_modules/vitest/vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts
node --import tsx <redacted-diagnostic-activity-proof-script>
OPENCLAW_PROOF_TIMEOUT_MS=120000 node --import tsx scripts/repro/codex-dynamic-tool-diagnostics-live-proof.mjs
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):

The patched boundary now emits the terminal event that clears gateway activity state. I validated the redacted activity transition directly against the same diagnostic-run-activity tracker used by gateway diagnostics:

{
  "afterStart": {
    "activeWorkKind": "tool_call",
    "activeToolName": "bash",
    "activeToolCallId": "tool-call-redacted",
    "activeToolAgeMs": 0,
    "lastProgressAgeMs": 0,
    "lastProgressReason": "tool:bash:started"
  },
  "afterCompleted": {
    "lastProgressAgeMs": 0,
    "lastProgressReason": "tool:bash:ended"
  }
}

After the terminal diagnostic, the gateway activity snapshot no longer contains activeWorkKind=tool_call, activeToolName=bash, or activeToolCallId.

Patched local gateway transport attempt, redacted:

OpenClaw 2026.5.17 (78d4183)
[proof] setup
[proof] pair-device
[proof] start-gateway
[gateway] agent model: codex/gpt-5.4 (thinking=medium, fast=off)
[gateway] http server listening (1 plugin: codex)
[gateway] ready
[proof] gateway-call-agent
Gateway call failed: GatewayTransportError: gateway timeout after 120000ms
Gateway target: ws://127.0.0.1:<redacted-port>
Config: <redacted-temp-config-path>
  • Observed result after fix: Source-level patched request-boundary coverage and diagnostic activity proof show that terminal tool.execution.completed/tool.execution.error clears the active bash tool state that produced the stalled diagnostics. The local patched gateway starts from commit 78d4183, but the live Codex transport did not reach tool execution in the patched runtime attempt.
  • What was not tested: A successful patched live Codex bash tool request completing end-to-end through gateway call agent. The patched live attempts either disconnected before tool execution or timed out before a terminal tool event, so this PR still relies on the focused request-boundary test plus the gateway activity transition proof unless a maintainer can reproduce the live transport path.
  • Before evidence (optional but encouraged):
2026-05-18T07:52:04.249+02:00 [diagnostic] stalled session: sessionId=<redacted> sessionKey=<redacted> state=processing age=143s queueDepth=0 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:thread/tokenUsage/updated lastProgressAge=141s activeTool=bash activeToolCallId=<redacted> activeToolAge=148s recovery=none
2026-05-18T07:57:04.291+02:00 [diagnostic] stalled session: sessionId=<redacted> sessionKey=<redacted> state=processing age=443s queueDepth=0 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:thread/tokenUsage/updated lastProgressAge=441s activeTool=bash activeToolCallId=<redacted> activeToolAge=448s recovery=none
2026-05-18T07:57:27.172+02:00 [gateway] draining 2 active task(s) and 1 active embedded run(s) before restart with timeout 300000ms
2026-05-18T07:57:57.185+02:00 [gateway] still draining 2 active task(s) and 1 active embedded run(s) before restart
openclaw-gateway.service: State 'stop-sigterm' timed out. Killing.

No secrets, auth tokens, profile hashes, local usernames, session ids, full local paths, ports, or temp config paths are included in the proof above.

Root Cause

  • Root cause: The item/tool/call branch tracked dynamic tool request state, but did not emit a terminal diagnostic event or clear the tracked dynamic tool call id at that request boundary.
  • Missing detection / guardrail: Existing tests covered dynamic tool lookup and error behavior, but not the gateway-facing diagnostic start/end contract for this response-boundary path.
  • Contributing context (if known): Related diagnostic stalls existed in the same broad area, but this report appears to be a distinct response-boundary variant.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/codex/src/app-server/run-attempt.test.ts
  • Scenario the test should lock in: Dynamic item/tool/call handling emits start and terminal diagnostic events and clears tracked dynamic tool ids.
  • Why this is the smallest reliable guardrail: The bug is at a narrow app-server request boundary; the focused test exercises that boundary without depending on live Codex transport reliability.
  • Existing test that already covers this (if any): Added in this PR.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Codex-backed OpenClaw sessions should stop reporting a completed dynamic tool as still active after the app-server returns the dynamic tool response.

Diagram

Before:
[item/tool/call] -> dynamic tool response -> no terminal diagnostic -> activeTool remains set

After:
[item/tool/call] -> tool.execution.started -> dynamic tool response -> terminal diagnostic -> activeTool cleared

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Ubuntu/Linux desktop
  • Runtime/container: Node 22.22.0 locally; focused Vitest run under Node 24.15.0 because host Node 22 cannot load the repo Vitest config
  • Model/provider: openai-codex / codex/gpt-5.4 and codex/gpt-5.5 attempts
  • Integration/channel (if any): OpenClaw Gateway with bundled Codex app-server plugin
  • Relevant config (redacted): local token auth, loopback gateway, Codex app-server mode: yolo, temp state/config paths redacted

Steps

  1. Reproduce the stalled active-tool diagnostics on installed OpenClaw 2026.5.12.
  2. Apply the patch and run focused static/type/test validation.
  3. Validate the gateway diagnostic activity transition with terminal tool.execution.completed.
  4. Attempt a patched local gateway live Codex bash run; record timeout/disconnect outcome without exposing secrets.

Expected

  • Terminal dynamic tool diagnostics clear activeTool=bash and its tool call id.

Actual

  • Diagnostic activity proof clears the active bash tool state after the terminal event.
  • Patched live Codex transport did not reach tool execution in this environment.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What I personally verified (not just CI), and how:

  • Verified scenarios: Pre-fix stalled active bash diagnostics; patched diagnostic activity transition; focused run-attempt regression test.
  • Edge cases checked: Terminal completed/error event handling and tracked dynamic id cleanup in the patched request-boundary code path.
  • What I did not verify: Successful patched live Codex bash tool completion through gateway call agent, because live transport timed out or disconnected before tool execution.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: Prematurely clearing active tool diagnostics could hide a still-active tool call.
    • Mitigation: The patch emits terminal cleanup only after the dynamic tool response returns or throws, and the focused test covers the diagnostic event sequence.
  • Risk: Live after-fix transport proof is incomplete.
    • Mitigation: The proof calls out the gap explicitly and keeps the before logs, focused regression, and activity-transition evidence redacted and reproducible.

Validation

Passed:

node --check --experimental-strip-types extensions/codex/src/app-server/run-attempt.ts
pnpm tsgo:test:extensions
pnpm dlx node@24 ./node_modules/vitest/vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts

Focused Vitest result under Node 24:

Test Files  1 passed (1)
Tests  164 passed (164)
Duration  43.22s

Note: the host's system Node 22.22.0 cannot load the repo Vitest config because path.matchesGlob throws Missing internal module 'internal/deps/brace-expansion'. Running the same focused test with temporary Node 24.15.0 avoids that local runtime issue.

AI Assistance

This PR was AI-assisted. I reviewed the issue evidence, searched for related upstream reports before filing, made the code changes locally, redacted the proof, and ran the validation commands listed above.

@rozmiarD rozmiarD force-pushed the codex-dynamic-tool-release-fix branch from 8e57ea6 to d5cbb4c Compare May 18, 2026 06:37
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 18, 2026
@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR makes Codex app-server dynamic tool requests emit boundary-owned start and terminal diagnostics, disables duplicate hook-owned diagnostics for that path, and adds regression coverage for success, error, timeout, abort, blocked, and side-question tool calls.

Reproducibility: no. live high-confidence reproduction was established in this review. The current-main source path lacks a terminal dynamic-tool diagnostic at the request boundary, and the linked report supplies redacted production logs for the stuck blocked_tool_call symptom.

PR rating
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦞 diamond lobster
Summary: Patch quality is strong after the owner rewrite and focused tests, but overall readiness is still capped by incomplete real behavior proof for an availability-sensitive runtime path.

Rank-up moves:

  • Add redacted live Codex app-server dynamic-tool proof showing the active diagnostic state clear after completion, or have a maintainer explicitly accept the proof gap.
  • Wait for the latest in-progress Critical Quality checks to finish.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature, rarity, or ASCII portrait is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

Real behavior proof
Needs stronger real behavior proof before merge: The PR has redacted logs and diagnostic activity output, but it still does not show a successful patched live Codex dynamic-tool completion through the gateway; add redacted terminal/log/recording proof or get maintainer acceptance. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge
Why this matters: - The supplied after-fix proof still does not show a successful patched live Codex dynamic bash request completing through the gateway and clearing the stalled diagnostic state.

  • This PR changes process-wide tool diagnostic ownership for Codex dynamic tools, so a missed or premature terminal event can affect stalled-session detection and gateway drain behavior.
  • Two latest-head Critical Quality checks were still in progress when inspected.

Maintainer options:

  1. Get live Codex dynamic-tool proof (recommended)
    Ask for or run redacted terminal/log proof showing a patched Codex dynamic tool request complete through the gateway and clear the active tool diagnostic state.
  2. Accept the focused proof gap
    Maintainers can intentionally merge with the focused request-boundary tests, diagnostic activity output, and green checks if live Codex transport remains unreliable to reproduce.
  3. Pause if checks or proof do not settle
    If latest-head CI or proof acceptance remains unresolved, keep this PR open rather than landing an availability-sensitive diagnostic change on partial evidence.

Next step before merge
No narrow code repair is indicated; the remaining action is maintainer proof acceptance or live validation plus latest-head CI completion.

Security
Cleared: The diff adds no new dependency, workflow, permission, credential, network, or command-execution authority; it changes diagnostic bookkeeping around existing dynamic tool execution.

Review details

Best possible solution:

Merge the boundary-owned diagnostic cleanup after the remaining latest-head checks pass and maintainers either obtain redacted live Codex dynamic-tool proof or explicitly accept the focused diagnostic proof gap.

Do we have a high-confidence way to reproduce the issue?

No live high-confidence reproduction was established in this review. The current-main source path lacks a terminal dynamic-tool diagnostic at the request boundary, and the linked report supplies redacted production logs for the stuck blocked_tool_call symptom.

Is this the best way to solve the issue?

Yes, with a proof caveat. Moving terminal diagnostic ownership to the Codex app-server request boundary is a narrow maintainable fix, but the merge decision still needs live proof or explicit maintainer acceptance of the proof gap.

Label justifications:

  • P1: The PR targets a Codex session availability bug where completed dynamic tools can leave sessions stuck as blocked_tool_call and interfere with gateway drain/restart.
  • merge-risk: 🚨 availability: Merging changes tool diagnostic start/end bookkeeping that stalled-session classification and recovery depend on.

What I checked:

Likely related people:

  • steipete: Current-main blame for the dynamic tool request path and dynamic bridge points to Peter Steinberger's recent Codex app-server work, and the live PR is assigned to steipete with the latest head commit authored by Peter Steinberger. (role: recent area contributor and PR assignee; confidence: high; commits: e3d802a10b15, 384ddae86f75, 76d1a7eb9fae; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/dynamic-tools.ts, src/agents/pi-tools.before-tool-call.ts)
  • Eva: A same-day Codex app-server run-attempt refactor modified the surrounding runtime surface, making Eva a plausible adjacent reviewer for interactions with the selected harness flow. (role: recent adjacent contributor; confidence: low; commits: 2a0350b5b490; files: extensions/codex/src/app-server/run-attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against f0b43bfd34c4.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 18, 2026
@rozmiarD rozmiarD force-pushed the codex-dynamic-tool-release-fix branch from d5cbb4c to 78d4183 Compare May 18, 2026 07:07
@rozmiarD

Copy link
Copy Markdown
Contributor Author

Updated the PR body with redacted after-fix diagnostic activity proof and a focused run-attempt Vitest result under Node 24.15.0. @clawsweeper re-review

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 18, 2026

Copy link
Copy Markdown
Contributor Author

Updated the PR body to follow the full proof template and keep all sensitive values redacted. The added section includes the pre-fix stalled activeTool=bash evidence, patched diagnostic activity proof, validation commands, and the latest patched-gateway live attempt outcome.

Important remaining gap is called out explicitly: the patched live Codex transport still did not reach a successful bash tool execution in this environment; it timed out/disconnected before the terminal tool event. @clawsweeper re-review

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. size: M and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. size: S labels May 18, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d650cb791

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/codex/src/app-server/run-attempt.ts Outdated
@rozmiarD

Copy link
Copy Markdown
Contributor Author

P2 follow-up and proof update for PR #83476.

I pushed commit 9a4816064d fix(codex): avoid duplicate dynamic tool diagnostics to address the review comment at #discussion_r3257821154.

What changed:

  • CodexDynamicToolBridge now exposes ownsToolExecutionDiagnostics(toolName) separately from the existing diagnostic tool-name set.
  • run-attempt no longer emits an additional terminal tool.execution.error for a normal failed wrapped dynamic tool response when the wrapped tool path already owns the start/error diagnostics.
  • Timeout/abort fallback responses still emit terminal diagnostics from run-attempt, so diagnostic state is not stranded when the wrapped tool path never reaches its own terminal event.
  • The existing remote hardening for clearing active dynamic-tool diagnostics after successful tool responses is preserved.

Proof run:

env PATH=/tmp:$PATH NODE_OPTIONS=--import=/tmp/openclaw-vitest-matchesglob-polyfill.mjs npm exec -- vitest run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts --testNamePattern "wrapped dynamic tool errors|clears dynamic tool diagnostics|normalized tool progress"

Result:

Test Files  1 passed (1)
Tests       4 passed | 162 skipped (166)
Duration    17.05s

The new P2 fixture is does not duplicate terminal diagnostics for wrapped dynamic tool errors. It drives an app-server item/tool/call where the wrapped echo dynamic tool throws, verifies the tool response is success: false, and asserts the terminal diagnostic sequence for call-echo-error is exactly one tool.execution.started and one tool.execution.error.

The retained hardening fixture, clears dynamic tool diagnostics after successful app-server tool responses, asserts activeDiagnosticToolKeys(diagnosticEvents) is empty after a successful dynamic tool response, covering the non-stranded active-tool state.

Live gateway note:

I also attempted a fresh live proof on this VM on 2026-05-18 using the patched checkout and the refreshed Codex auth profile. Both source-runtime paths stalled before reaching the actual Codex tool turn:

  • node --import tsx src/index.ts gateway run --port 19076 ... opened the proof port, but health checks timed out while the source gateway stayed in high CPU/RSS source loading.
  • vitest.live.config.ts with the Codex harness wrote its temporary config and opened its loopback port, but also stayed in high CPU/RSS source loading before client-connected.

Those live attempts did not reach model/tool execution, so I am not presenting them as behavioral proof. I stopped them to avoid locking the VM and restored the local OpenClaw config. No secrets or auth tokens are included in this proof.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 18, 2026
@rozmiarD

Copy link
Copy Markdown
Contributor Author

Current state after 9a4816064d:

  • Addressed the Codex P2 duplicate terminal diagnostic concern.
  • Added regression coverage for wrapped dynamic tool failures to assert exactly one tool.execution.started and one tool.execution.error for the same toolCallId.
  • The original Codex review thread is now outdated after the fix, though still unresolved in GitHub.

Current CI status still needs maintainer/author triage. The media/image-ops failures appear unrelated to this Codex app-server change because this PR only touches Codex app-server files. I am checking the Codex prompt snapshot drift separately; if the drift is caused by this PR, I will commit the regenerated snapshots. If not, I will leave it for maintainer judgment or rerun.

The live Codex gateway proof gap remains honestly documented: patched source-runtime attempts still do not reach dynamic tool execution in my environment, so the PR relies on focused request-boundary coverage plus diagnostic activity proof unless a maintainer can reproduce the live transport path.

@codex review
@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a4816064d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/codex/src/app-server/run-attempt.ts Outdated
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 18, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 18, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the gateway Gateway runtime label May 18, 2026

Copy link
Copy Markdown
Contributor Author

Current state after b8d60351a1:

  • Merged current upstream/main (b823a5a26626) into this PR branch to restore PR mergeability.
  • Resolved the config.shared-auth.test.ts conflict by keeping upstream's exact GATEWAY_CONFIG_WRITE_OPTIONS assertion, so the shared-auth write-options guard is no longer weakened in the final branch state.
  • Fixed the reproduced source-delivery-plan.test.ts timeout by changing the threaded-delivery fixture away from the Telegram provider to a neutral custom provider. This keeps the test focused on explicit/supported implicit thread evidence without invoking Telegram channel plugin target normalization.
  • No additional live gateway proof is being added from this VM; the proof gap remains the already documented maintainer-acceptance item.

Local validation on the latest branch state:

env PATH=/tmp:$PATH NODE_OPTIONS=--import=/tmp/openclaw-vitest-matchesglob-polyfill.mjs npm exec -- vitest run --config test/vitest/vitest.gateway-methods.config.ts src/gateway/server-methods/config.shared-auth.test.ts

Test Files  1 passed (1)
Tests       9 passed (9)
env PATH=/tmp:$PATH NODE_OPTIONS=--import=/tmp/openclaw-vitest-matchesglob-polyfill.mjs npm exec -- vitest run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts --testNamePattern "dynamic tool diagnostics|wrapped dynamic tool|clears dynamic tool diagnostics|normalized tool progress|request-boundary terminal"

Test Files  1 passed (1)
Tests       7 passed | 165 skipped (172)
env PATH=/tmp:$PATH NODE_OPTIONS=--import=/tmp/openclaw-vitest-matchesglob-polyfill.mjs npm exec -- vitest run --config test/vitest/vitest.full-core-unit-fast.config.ts src/infra/outbound/source-delivery-plan.test.ts --testNamePattern "matches threaded delivery only"

Test Files  1 passed (1)
Tests       1 passed | 8 skipped (9)

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot removed the merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. label May 18, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up after ClawSweeper P1 on b8d60351a1: pushed add8073420.

Addressed the timeout/abort fallback diagnostic gap for wrapped dynamic tools that do not respond to AbortSignal:

  • request-boundary terminal cleanup now depends on whether a matching terminal diagnostic was actually observed after the drain, instead of suppressing cleanup solely because a fallback response occurred;
  • abort-reactive wrapped tools still avoid duplicate terminal diagnostics because their wrapper terminal event is observed before the request-boundary fallback emits anything;
  • added emits terminal diagnostics for wrapped dynamic tool timeout fallbacks that ignore abort, which uses a wrapped tool that never settles after timeout and asserts the request boundary emits the missing terminal tool.execution.error for the same toolCallId.

Validation:

env PATH=/tmp:$PATH NODE_OPTIONS=--import=/tmp/openclaw-vitest-matchesglob-polyfill.mjs npm exec -- vitest run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts --testNamePattern "dynamic tool diagnostics|wrapped dynamic tool"

Test Files  1 passed (1)
Tests       6 passed | 167 skipped (173)
env PATH=/tmp:$PATH NODE_OPTIONS=--import=/tmp/openclaw-vitest-matchesglob-polyfill.mjs npm exec -- vitest run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts --testNamePattern "dynamic tool diagnostics|wrapped dynamic tool|clears dynamic tool diagnostics|normalized tool progress|request-boundary terminal"

Test Files  1 passed (1)
Tests       8 passed | 165 skipped (173)
env PATH=/tmp:$PATH NODE_OPTIONS=--import=/tmp/openclaw-vitest-matchesglob-polyfill.mjs node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --testNamePattern "dynamic tool diagnostics|wrapped dynamic tool"
# completed with exit code 0
node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo
# completed with exit code 0

No additional live gateway proof is being added from this VM.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 18, 2026

Copy link
Copy Markdown
Contributor Author

Latest ClawSweeper re-review on add8073420 no longer identifies a narrow code repair item. The remaining pre-merge decision is proof acceptance: live Codex gateway proof from this VM is not reliable enough to add without risking misleading evidence, so I am not adding another live proof attempt.

Current state:

  • add8073420 addresses the timeout/abort fallback cleanup gap for wrapped dynamic tools that ignore AbortSignal.
  • Focused regression coverage now includes the non-abort-responsive timeout fallback path.
  • GitHub CI for the current head is green: Workflow Sanity, OpenGrep, CI, CodeQL, and CodeQL Critical Quality completed successfully.
  • The PR branch is up to date with the current base commit (b823a5a26626) and is 11 commits ahead / 0 behind by compare.

Please treat the remaining proof item as a maintainer acceptance decision for the existing redacted production logs, diagnostic activity proof, focused regression tests, and green CI.

@steipete steipete force-pushed the codex-dynamic-tool-release-fix branch from add8073 to 056051a Compare May 18, 2026 14:22
@steipete

Copy link
Copy Markdown
Contributor

Rewrote this into a boundary-owned Codex app-server diagnostic fix.

Behavior addressed: OpenClaw dynamic tool calls served through Codex item/tool/call now emit request-boundary tool.execution.started plus exactly one terminal diagnostic for success, error, timeout, abort, and blocked results, so diagnostic run activity does not keep an active tool after Codex has accepted the tool response.
Real environment tested: local OpenClaw checkout on latest origin/main after rebase. Also checked ../codex; Codex dynamic tool statuses are only in-progress/completed/failed, so blocked is OpenClaw diagnostic state, not a Codex protocol terminal state.
Exact steps or command run after this patch: node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tools.test.ts extensions/codex/src/app-server/run-attempt.test.ts extensions/codex/src/app-server/side-question.test.ts; node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts -t "dynamic tool diagnostics|request-boundary terminal diagnostics|duplicate terminal diagnostics"; node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts -t "reuses the bound auth profile"; pnpm tsgo:prod; pnpm tsgo:extensions:test; pnpm run test:extensions:package-boundary:compile; AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main.
Evidence after fix: focused app-server tests passed 221/221 before the final strict identity fix; the five targeted diagnostic regression tests passed after it; the unrelated auth-profile test that failed once in the full focused run passed when rerun directly; both TS lanes and package-boundary compile passed; final autoreview was clean.
Observed result after fix: successful, failed, timed-out, aborted, blocked, side-question, and pre-wrapped dynamic-tool paths close their OpenClaw active diagnostic state without relying on Codex to emit a matching diagnostic terminal event.
What was not tested: a live Codex app-server session with real ChatGPT auth was not run in this local pass; CI is now validating the rewritten branch.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XL and removed size: L labels May 18, 2026
@steipete steipete force-pushed the codex-dynamic-tool-release-fix branch from 056051a to 76d1a7e Compare May 18, 2026 14:24

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76d1a7eb9f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +67 to +68
deniedReason: "plugin-before-tool-call",
reason: "Tool call blocked",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve blocked denial metadata in terminal diagnostics

When a dynamic tool is blocked, this emitter always reports deniedReason: "plugin-before-tool-call" and reason: "Tool call blocked", but blocked results can carry other denial causes (for example plugin-approval or tool-loop from buildBlockedToolResult in src/agents/pi-tools.before-tool-call.ts). Because this commit disables hook-owned tool diagnostics for Codex dynamic tools, this hardcoded branch now becomes the only blocked terminal event and misclassifies real block reasons in downstream diagnostics/metrics.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76d1a7eb9f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +67 to +68
deniedReason: "plugin-before-tool-call",
reason: "Tool call blocked",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve blocked denial metadata in terminal diagnostics

When a dynamic tool is blocked, the bridge only preserves a diagnosticTerminalType flag and this emitter hard-codes deniedReason/reason to plugin-before-tool-call and Tool call blocked. That drops the real block metadata (for example approval or loop-denial reasons) produced by the before-tool-call path, so downstream diagnostics and alerting cannot distinguish why the tool was denied and will misclassify non-policy blocks as policy vetoes.

Useful? React with 👍 / 👎.

Co-authored-by: 0x505badc0de <32790662+rozmiarD@users.noreply.github.com>

Co-authored-by: Peter Steinberger <steipete@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: XL status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Codex dynamic tool calls can leave sessions stuck as blocked_tool_call

2 participants