Skip to content

[Bug]: Codex dynamic tool calls can leave sessions stuck as blocked_tool_call #83474

@rozmiarD

Description

@rozmiarD

Bug type

Crash / hang

Beta release blocker

No

Summary

Codex app-server dynamic tool calls can leave an OpenClaw session stuck in blocked_tool_call even after the tool request has returned a successful result to the UI/transcript.

I checked for existing reports before filing. This looks related to the orphaned tool-call family in #42112 and the failed-tool-call hang family in #8288, but I did not find an exact duplicate for this item/tool/call response path: the dynamic tool request returns, the UI shows the bash result as completed, but the gateway diagnostics keep activeTool=bash with recovery=none.

Steps to reproduce

  1. Run OpenClaw with the bundled Codex harness / @openclaw/codex.
  2. Start a Codex-backed agent turn that executes a dynamic bash tool request.
  3. In the observed case, the tool command was:
/bin/bash -lc 'PYTHONDONTWRITEBYTECODE=1 .venv/bin/python scripts/validate_public_install.py --dev && PYTHONDONTWRITEBYTECODE=1 timeout 600 .venv/bin/python scripts/run_security_contract_validation.py --include-pytest'

The command ran in:

<redacted-local-worktree-path>
  1. The tool UI/result reports: No output — tool completed successfully.
  2. Do not restart the gateway immediately. Watch the active session diagnostics.

Expected behavior

After the dynamic tool response returns to Codex/OpenClaw, OpenClaw should clear the active tool bookkeeping and emit a terminal tool.execution.completed or tool.execution.error diagnostic for that toolCallId. The session should then either continue to the next model event or hit the normal post-tool completion guard, but it should not remain classified as an active blocked tool call.

Actual behavior

The session remains in state=processing and the gateway repeatedly classifies it as blocked_tool_call with activeTool=bash, even though the tool response has already been surfaced as completed.

Relevant gateway log excerpt from May 18, 2026:

2026-05-18T07:52:04.249+02:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=<redacted-session-key> state=processing age=143s queueDepth=0 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:thread/tokenUsage/updated lastProgressAge=141s activeTool=bash activeToolCallId=<redacted-tool-call-id> activeToolAge=148s recovery=none
2026-05-18T07:57:04.291+02:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=<redacted-session-key> state=processing age=443s queueDepth=0 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:thread/tokenUsage/updated lastProgressAge=441s activeTool=bash activeToolCallId=<redacted-tool-call-id> activeToolAge=448s recovery=none
2026-05-18T07:57:27.172+02:00 [gateway] draining 2 active task(s) and 1 active embedded run(s) before restart with timeout 300000ms
2026-05-18T07:57:57.185+02:00 [gateway] still draining 2 active task(s) and 1 active embedded run(s) before restart
openclaw-gateway.service: State 'stop-sigterm' timed out. Killing.

An older affected turn from the same session family eventually hit the long terminal idle timeout instead:

turn.terminal_idle_timeout ... idleMs=1800001 timeoutMs=1800000 lastActivityReason=notification:thread/tokenUsage/updated

OpenClaw version

Observed on installed OpenClaw / @openclaw/codex 2026.5.12. The affected source path still existed on current upstream main when checked on May 18, 2026, before preparing the linked PR.

OS

Linux x64, systemd user service. Host evidence was collected on 6.19.14+kali-amd64.

Install method

npm-managed OpenClaw install with user openclaw-gateway.service.

Model

Codex-backed OpenAI model routing. The affected session was using the OpenAI Codex provider override with gpt-5.4/Codex harness routing.

Provider / routing chain

OpenClaw gateway -> bundled @openclaw/codex -> Codex app-server / ChatGPT Codex transport.

Additional provider/model setup details

No API-key provider path is required for the observed issue; this was the Codex app-server dynamic tool bridge path.

Logs/screenshots/evidence

Evidence above includes:

  • The exact dynamic bash command and redacted working-directory evidence.
  • UI/tool-result state: No output — tool completed successfully.
  • Gateway diagnostics repeatedly reporting blocked_tool_call activeTool=bash recovery=none with session and tool-call identifiers redacted.
  • Gateway restart drain timing out because the embedded run remained active.
  • A prior affected turn reaching turn.terminal_idle_timeout after 30 minutes.

Impact/severity

High for Codex-backed local agents. A successfully completed dynamic shell command can still leave the OpenClaw session unusable until timeout or gateway restart. Restart may also hang during drain and require systemd to kill the service.

Additional information

I prepared a PR that makes the dynamic item/tool/call request boundary emit terminal tool diagnostics and clear active dynamic tool bookkeeping in finally, so the gateway does not keep a completed dynamic tool as active.

Sensitive values in the evidence were redacted after filing while preserving the observed event order, diagnostic classification, active tool name, timings, and restart-drain behavior.

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions