Skip to content

[Bug]: openai-codex/gpt-5.3-codex intermittently returns text-only responses with no tool calls (early termination) #28754

@dhruvkelawala

Description

@dhruvkelawala

Summary

openai-codex/gpt-5.3-codex intermittently returns text-only responses with no tool calls, even when the task clearly requires tool use. This is NOT specific to sessions_send — it happens when Codex is the primary agent model receiving messages via any channel (Discord, heartbeat, sessions_send).

The model responds with text like "Acknowledged, I'll do this" and emits end_turn instead of calling tools. The gateway treats this as a successful completion (not an error), so no retry/failover is triggered.

Steps to reproduce

  1. Configure an agent with openai-codex/gpt-5.3-codex as the primary model
  2. Send a task that requires tool calls (e.g. "run this test, read the output, fix the file")
  3. The model responds with text acknowledging the task but makes zero tool calls
  4. Run completes in 1.7-3.5 seconds with reason="run_completed"
  5. This is intermittent — the same session with the same context may work correctly on the next attempt

This reproduces regardless of message source:

  • Direct Discord messages to the agent
  • sessions_send from another agent
  • Heartbeat-triggered runs

Expected behavior

When given a task requiring tool use, Codex should emit tool calls (exec, read, write, etc.) and the agent loop should continue until the task is complete.

Actual behavior

Model returns valid text but emits end_turn with zero tool calls. stopReason is undefined (not toolUse). The gateway considers this a successful run and goes idle.

The behavior is intermittent but clustered — once it starts, multiple consecutive runs fail the same way, suggesting the model enters a "conversational" mode.

OpenClaw version

2026.2.26

Operating system

macOS 15.4 (Apple Silicon, M4)

Install method

npm global

Logs, screenshots, and evidence

Failing Codex run — text-only, no tool calls (3.5s):

13:01:12.663 embedded run start: provider=openai-codex model=gpt-5.3-codex thinking=off
13:01:12.700 [context-diag] pre-prompt: messages=114 systemPromptChars=36808 historyChars=76915 promptChars=871
13:01:16.254 embedded run prompt end: durationMs=3570
13:01:16.257 session state: prev=processing new=idle reason="run_completed" queueDepth=0

Model output: "Acknowledged. I'll run this exactly as specified..." — no tool calls.

Working Codex run — same session, 25 min earlier (5 min, 48 tool calls):

09:51:41 [context-diag] pre-prompt: messages=16 systemPromptChars=37668 historyChars=8791
09:56:33 [context-diag] pre-prompt: messages=110 systemPromptChars=37807 historyChars=76368

5 minutes of active tool use. 94 new messages. Code committed and pushed successfully.

Key data points:

  • Context nearly identical between working (37K sys + 77K history) and failing runs (36K sys + 77K history)
  • Working run: 5 minutes, 48 tool calls. Failing run: 3.5 seconds, 0 tool calls
  • Same model, same session, same OAuth auth, same thinking level

Impact and severity

  • Affected: Any agent using openai-codex/gpt-5.3-codex
  • Severity: High — agent appears to accept tasks but does nothing; no error triggered so no automatic fallback
  • Frequency: Intermittent, clustered. Multiple times per day. Once it starts, 3-4 consecutive runs fail.
  • Consequence: Autonomous coding agents become unreliable. Text-only "I'll do it" responses waste tokens and give false impression of progress.

Additional information

  • Related upstream issues: openai/codex#10828, anomalyco/opencode#12570 — both report similar early termination with gpt-5.3-codex
  • Similar OpenClaw issue: [Bug]: Telegram Bot only returns single message, stops typing immediately, no action executed (streaming: partial) #26494 reports identical symptoms (single message, stops immediately, no action) with the same model on Telegram
  • Not a sessions_send issue: The ping-pong handshake can amplify the pattern, but the root cause is the model choosing text over tool calls. This reproduces with direct Discord messages too.
  • Feature request: Could OpenClaw detect text-only responses to tool-requiring tasks and auto-retry or trigger fallback? Currently text-only is treated as success, not as a retry/failover condition.
  • Thinking level: Observed with both off and medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions