Skip to content

Agent sessions using Codex app-server backend timeout - model completes but response never delivered to channels #82343

@McoreD

Description

@McoreD

Bug: embedded_run response delivery deadlock in codex-app-server path

Problem Summary

Mikhail (agent ID: mikhail, runtime: acp with codex backend in acpx mode) consistently fails to deliver responses to Discord and Telegram. The model generates output successfully (confirmed by stopReason: "stop" and token usage counts), but the final response assembly and channel delivery never occurs. Sessions timeout at ~616 seconds with status: "interrupted" or status: "timeout".

Symptoms

  1. Model runs fine - gpt-5.5 via OpenAI Responses API generates complete output with stopReason: "stop" and valid usage counts
  2. Delivery never happens - No message appears in Discord/Telegram despite model completion
  3. Session times out - status: "timeout or status: "interrupted after ~616 seconds
  4. Stalled detection fires - OpenClaw diagnostic logs show stalled_agent_run classification with terminalProgressStale=true recovery=none

Diagnostic Evidence

Discord session:

  • lastProgress=codex_app_server:notification:thread/tokenUsage/updated — tokens counted, model running
  • Later: lastProgress=codex_app_server:notification:rawResponseItem/completed — model finished
  • activeWorkKind=embedded_run
  • recovery=none — OpenClaw knows it cannot self-heal

Telegram session:

  • lastProgress=codex_app_server:notification:rawResponseItem/completed — model finished, response ready
  • terminalProgressStale=true recovery=none
  • Same embedded_run path, same deadlock

Configuration

Mikhail is correctly configured:

  • Discord binding: routes channel=discord, accountId=mikhail to agent mikhail
  • Discord token: present in secrets via SecretRef ✓
  • ACP runtime: backend: "acpx", mode: "persistent
  • Plugin codex: enabled: true, appServer.mode: "yolo", appServer.transport: "stdio
  • Gateway: running on loopback port 18789, Read probe: ok

Root Cause Hypothesis

The Codex app server (codex app-server) successfully processes the request, the model generates a complete response, and rawResponseItem/completed fires correctly. However, the embedded_run handler in OpenClaw never receives or processes the completed response event. The delivery pipeline from Codex → OpenClaw → Discord/Telegram is broken at the final step.

Environment

  • Platform: macOS Darwin 25.5.0 (x64)
  • Node: v22.22.0 (OpenClaw managed)
  • OpenClaw: 2026.5.12
  • Gateway: running via LaunchAgent
  • Codex app server: running as separate process

Expected Behavior

Simple question like "@mikhail tell me your model and model version" should complete and deliver in under 30 seconds, not timeout after 10 minutes with a completed model response sitting undelivered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions