Skip to content

[Bug]: GENERIC_EXTERNAL_RUN_FAILURE_TEXT delivered on success-path turn — model.completed.aborted=false + assistantTexts non-empty + messagingToolSentTexts empty (2026.5.26 / openai-codex / WhatsApp) #87359

@aleps001

Description

@aleps001

Summary

On OpenClaw 2026.5.26 (10ad3aa), with openai-codex/gpt-5.5 via the Codex app-server and the pi harness, the user-facing canned text ⚠️ Something went wrong while processing your request. Please try again, or use /new to start a fresh session. was delivered to a WhatsApp channel on a turn that the runtime trajectory marks as a clean success. The model produced an assistant text, the trajectory reports finalStatus=success with no abort/timeout flags, but the channel did not receive the text — the runtime's fallback failure copy was delivered instead.

This appears to be different from issues like #84076 (Codex app-server stall after item/completed, recovery=none), which are failure-path bugs where turn/completed never arrives. Here model.completed fires cleanly with aborted=false and non-empty assistantTexts, yet the canned text leaks. Opening as a separate issue because the failure shape and likely fix location look distinct.

Environment

  • openclaw --versionOpenClaw 2026.5.26 (10ad3aa)
  • @earendil-works/pi-ai0.75.5
  • Provider: openai-codex/gpt-5.5
  • Harness: pi
  • Surface: WhatsApp direct (1:1)
  • Agent: top-level orchestrator (no parent)
  • Config relevant settings:
    • messages.visibleReplies: "automatic"
    • agents.defaults.silentReply: {} (default)
    • No per-agent visibleReplies or silentReply override on the affected agent

Observed behavior

End user received the canned text mid-conversation, between two healthy turns:

⚠️ Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

The user retried with a short follow-up message. The session continued cleanly into a multi-agent spawn (workflow-lead → enterprise-ai-lead → content-lead → editorial-review) over the next ~30 minutes with no further dispatch issues. No session corruption observed; no /new was required to recover.

Expected behavior

With messages.visibleReplies: "automatic" and the trajectory reporting model.completed.aborted=false, finalStatus=success, assistantTexts non-empty — the auto-reply path should dispatch the assistant text to the WhatsApp channel. The canned failure copy should not appear when the runtime considers the turn a success.

Trajectory evidence

From the affected main agent's trajectory file (agents/main/sessions/<sessionId>.trajectory.jsonl), the turn that emitted the canned text:

model.completed
  aborted=false
  externalAbort=false
  timedOut=false
  idleTimedOut=false
  timedOutDuringCompaction=false
  timedOutDuringToolExecution=false
  promptErrorSource=null
  usage={input: 99479, output: 1238, cacheRead: 39424, reasoningTokens: 1042, total: 140141}
  assistantTexts=["Vou montar como deck executivo para a Distrito..."]   ← non-empty plain assistant text

trace.artifacts
  finalStatus=success
  didSendViaMessagingTool=false
  messagingToolSentTexts=[]
  messagingToolSentMediaUrls=[]
  messagingToolSentTargets=[]

session.ended
  status=success

The three events fired within 7 ms of each other. Then a 44-second gap (user retry), then the next session.started. No aborted=true, no timedOut=true, no promptErrorSource, no surrounding error events. By every runtime marker the turn was a clean success. The agent's last tool sequence in the turn was read → read → update_plan followed by the plain assistant text block.

Code-path notes (from dist/ read)

GENERIC_EXTERNAL_RUN_FAILURE_TEXT is defined in dist/agent-runner-failure-copy-CU6Vmacs.js and emitted from three sites in dist/reply-turn-admission-ClQM84yB.js:

  1. Line ~611, inside formatForwardedExternalRunFailureTextif (!sanitized) return GENERIC_EXTERNAL_RUN_FAILURE_TEXT (sanitizer reduces source text to empty).
  2. Line ~655, inside buildExternalRunFailureReply — fallback after the 6-classifier chain (providerRequestError, missingApiKey, oauthRefresh, isHeartbeat, cliBackendTimeout, codexAppServer) all miss.
  3. Line ~1854, inside buildKnownAgentRunFailureReplyPayload — generic fallback after Embedded agent failed before reply: ....

Sites #2 and #3 fire from failure paths (buildKnownAgentRunFailureReplyPayload is reached only after agent run failed upstream). Site #1 fires only when sanitization eats all the text — and the assistant text in this case was plain Portuguese prose, not a structured error payload that the sanitizer would normally clean.

So the canned copy reached the channel on a success-path turn. Either:

  • there is a separate auto-reply admission gate (likely also in reply-turn-admission) that runs after session.ended and emits GENERIC_EXTERNAL_RUN_FAILURE_TEXT when assistantTexts is non-empty but messagingToolSentTexts is empty (a guard against agents that produce text without explicitly dispatching), mis-firing under visibleReplies=automatic where the runtime should auto-dispatch plain assistant text; or
  • the auto-dispatch path silently dropped the text between session.ended and the channel adapter, and a downstream handler rendered the generic fallback.

Either way, the trajectory contract reports success while the channel surfaces a failure-copy — the success-path contract and the user-facing contract are disagreeing.

Reproducibility

Not reliably reproducible from observation; appears intermittent. The affected turn followed a chain of read → read → update_plan tool calls and produced a single plain assistant-text block as final output. No special characters in the text (plain Portuguese prose). Same session continued through ~30 minutes of multi-agent spawn work afterward without recurrence.

If a maintainer wants, I can sanitize and attach the full session jsonl + trajectory file for the affected turn.

Workaround

End-user retry with any short follow-up message; the session continues cleanly. No /new required, no session corruption.

Why this looks orthogonal to the existing app-server stall issues

Issues such as #84076 describe Codex app-server stalls where turn/completed never arrives and recovery=none. Those are failure-path: the model run did not complete. In this case model.completed fires cleanly with aborted=false and a non-empty assistantTexts — the runtime considers the run successful, but the dispatch/admission layer downstream of session.ended emits the canned text anyway. The likely fix surface is the auto-reply admission gate after a successful run, not the Codex app-server signaling path.

Happy to provide further detail or test patches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions