Summary
On OpenClaw 2026.5.26 (10ad3aa), with openai-codex/gpt-5.5 via the Codex app-server and the pi harness, the user-facing canned text ⚠️ Something went wrong while processing your request. Please try again, or use /new to start a fresh session. was delivered to a WhatsApp channel on a turn that the runtime trajectory marks as a clean success. The model produced an assistant text, the trajectory reports finalStatus=success with no abort/timeout flags, but the channel did not receive the text — the runtime's fallback failure copy was delivered instead.
This appears to be different from issues like #84076 (Codex app-server stall after item/completed, recovery=none), which are failure-path bugs where turn/completed never arrives. Here model.completed fires cleanly with aborted=false and non-empty assistantTexts, yet the canned text leaks. Opening as a separate issue because the failure shape and likely fix location look distinct.
Environment
openclaw --version → OpenClaw 2026.5.26 (10ad3aa)
@earendil-works/pi-ai → 0.75.5
- Provider:
openai-codex/gpt-5.5
- Harness:
pi
- Surface: WhatsApp direct (1:1)
- Agent: top-level orchestrator (no parent)
- Config relevant settings:
messages.visibleReplies: "automatic"
agents.defaults.silentReply: {} (default)
- No per-agent
visibleReplies or silentReply override on the affected agent
Observed behavior
End user received the canned text mid-conversation, between two healthy turns:
⚠️ Something went wrong while processing your request. Please try again, or use /new to start a fresh session.
The user retried with a short follow-up message. The session continued cleanly into a multi-agent spawn (workflow-lead → enterprise-ai-lead → content-lead → editorial-review) over the next ~30 minutes with no further dispatch issues. No session corruption observed; no /new was required to recover.
Expected behavior
With messages.visibleReplies: "automatic" and the trajectory reporting model.completed.aborted=false, finalStatus=success, assistantTexts non-empty — the auto-reply path should dispatch the assistant text to the WhatsApp channel. The canned failure copy should not appear when the runtime considers the turn a success.
Trajectory evidence
From the affected main agent's trajectory file (agents/main/sessions/<sessionId>.trajectory.jsonl), the turn that emitted the canned text:
model.completed
aborted=false
externalAbort=false
timedOut=false
idleTimedOut=false
timedOutDuringCompaction=false
timedOutDuringToolExecution=false
promptErrorSource=null
usage={input: 99479, output: 1238, cacheRead: 39424, reasoningTokens: 1042, total: 140141}
assistantTexts=["Vou montar como deck executivo para a Distrito..."] ← non-empty plain assistant text
trace.artifacts
finalStatus=success
didSendViaMessagingTool=false
messagingToolSentTexts=[]
messagingToolSentMediaUrls=[]
messagingToolSentTargets=[]
session.ended
status=success
The three events fired within 7 ms of each other. Then a 44-second gap (user retry), then the next session.started. No aborted=true, no timedOut=true, no promptErrorSource, no surrounding error events. By every runtime marker the turn was a clean success. The agent's last tool sequence in the turn was read → read → update_plan followed by the plain assistant text block.
Code-path notes (from dist/ read)
GENERIC_EXTERNAL_RUN_FAILURE_TEXT is defined in dist/agent-runner-failure-copy-CU6Vmacs.js and emitted from three sites in dist/reply-turn-admission-ClQM84yB.js:
- Line ~611, inside
formatForwardedExternalRunFailureText — if (!sanitized) return GENERIC_EXTERNAL_RUN_FAILURE_TEXT (sanitizer reduces source text to empty).
- Line ~655, inside
buildExternalRunFailureReply — fallback after the 6-classifier chain (providerRequestError, missingApiKey, oauthRefresh, isHeartbeat, cliBackendTimeout, codexAppServer) all miss.
- Line ~1854, inside
buildKnownAgentRunFailureReplyPayload — generic fallback after Embedded agent failed before reply: ....
Sites #2 and #3 fire from failure paths (buildKnownAgentRunFailureReplyPayload is reached only after agent run failed upstream). Site #1 fires only when sanitization eats all the text — and the assistant text in this case was plain Portuguese prose, not a structured error payload that the sanitizer would normally clean.
So the canned copy reached the channel on a success-path turn. Either:
- there is a separate auto-reply admission gate (likely also in
reply-turn-admission) that runs after session.ended and emits GENERIC_EXTERNAL_RUN_FAILURE_TEXT when assistantTexts is non-empty but messagingToolSentTexts is empty (a guard against agents that produce text without explicitly dispatching), mis-firing under visibleReplies=automatic where the runtime should auto-dispatch plain assistant text; or
- the auto-dispatch path silently dropped the text between
session.ended and the channel adapter, and a downstream handler rendered the generic fallback.
Either way, the trajectory contract reports success while the channel surfaces a failure-copy — the success-path contract and the user-facing contract are disagreeing.
Reproducibility
Not reliably reproducible from observation; appears intermittent. The affected turn followed a chain of read → read → update_plan tool calls and produced a single plain assistant-text block as final output. No special characters in the text (plain Portuguese prose). Same session continued through ~30 minutes of multi-agent spawn work afterward without recurrence.
If a maintainer wants, I can sanitize and attach the full session jsonl + trajectory file for the affected turn.
Workaround
End-user retry with any short follow-up message; the session continues cleanly. No /new required, no session corruption.
Why this looks orthogonal to the existing app-server stall issues
Issues such as #84076 describe Codex app-server stalls where turn/completed never arrives and recovery=none. Those are failure-path: the model run did not complete. In this case model.completed fires cleanly with aborted=false and a non-empty assistantTexts — the runtime considers the run successful, but the dispatch/admission layer downstream of session.ended emits the canned text anyway. The likely fix surface is the auto-reply admission gate after a successful run, not the Codex app-server signaling path.
Happy to provide further detail or test patches.
Summary
On OpenClaw 2026.5.26 (
10ad3aa), withopenai-codex/gpt-5.5via the Codex app-server and thepiharness, the user-facing canned text⚠️ Something went wrong while processing your request. Please try again, or use /new to start a fresh session.was delivered to a WhatsApp channel on a turn that the runtime trajectory marks as a clean success. The model produced an assistant text, the trajectory reportsfinalStatus=successwith no abort/timeout flags, but the channel did not receive the text — the runtime's fallback failure copy was delivered instead.This appears to be different from issues like #84076 (Codex app-server stall after
item/completed,recovery=none), which are failure-path bugs whereturn/completednever arrives. Heremodel.completedfires cleanly withaborted=falseand non-emptyassistantTexts, yet the canned text leaks. Opening as a separate issue because the failure shape and likely fix location look distinct.Environment
openclaw --version→OpenClaw 2026.5.26 (10ad3aa)@earendil-works/pi-ai→0.75.5openai-codex/gpt-5.5pimessages.visibleReplies: "automatic"agents.defaults.silentReply: {}(default)visibleRepliesorsilentReplyoverride on the affected agentObserved behavior
End user received the canned text mid-conversation, between two healthy turns:
The user retried with a short follow-up message. The session continued cleanly into a multi-agent spawn (workflow-lead → enterprise-ai-lead → content-lead → editorial-review) over the next ~30 minutes with no further dispatch issues. No session corruption observed; no
/newwas required to recover.Expected behavior
With
messages.visibleReplies: "automatic"and the trajectory reportingmodel.completed.aborted=false,finalStatus=success,assistantTextsnon-empty — the auto-reply path should dispatch the assistant text to the WhatsApp channel. The canned failure copy should not appear when the runtime considers the turn a success.Trajectory evidence
From the affected
mainagent's trajectory file (agents/main/sessions/<sessionId>.trajectory.jsonl), the turn that emitted the canned text:The three events fired within 7 ms of each other. Then a 44-second gap (user retry), then the next
session.started. Noaborted=true, notimedOut=true, nopromptErrorSource, no surrounding error events. By every runtime marker the turn was a clean success. The agent's last tool sequence in the turn wasread → read → update_planfollowed by the plain assistant text block.Code-path notes (from
dist/read)GENERIC_EXTERNAL_RUN_FAILURE_TEXTis defined indist/agent-runner-failure-copy-CU6Vmacs.jsand emitted from three sites indist/reply-turn-admission-ClQM84yB.js:formatForwardedExternalRunFailureText—if (!sanitized) return GENERIC_EXTERNAL_RUN_FAILURE_TEXT(sanitizer reduces source text to empty).buildExternalRunFailureReply— fallback after the 6-classifier chain (providerRequestError,missingApiKey,oauthRefresh,isHeartbeat,cliBackendTimeout,codexAppServer) all miss.buildKnownAgentRunFailureReplyPayload— generic fallback afterEmbedded agent failed before reply: ....Sites #2 and #3 fire from failure paths (
buildKnownAgentRunFailureReplyPayloadis reached only afteragent run failedupstream). Site #1 fires only when sanitization eats all the text — and the assistant text in this case was plain Portuguese prose, not a structured error payload that the sanitizer would normally clean.So the canned copy reached the channel on a success-path turn. Either:
reply-turn-admission) that runs aftersession.endedand emitsGENERIC_EXTERNAL_RUN_FAILURE_TEXTwhenassistantTextsis non-empty butmessagingToolSentTextsis empty (a guard against agents that produce text without explicitly dispatching), mis-firing undervisibleReplies=automaticwhere the runtime should auto-dispatch plain assistant text; orsession.endedand the channel adapter, and a downstream handler rendered the generic fallback.Either way, the trajectory contract reports success while the channel surfaces a failure-copy — the success-path contract and the user-facing contract are disagreeing.
Reproducibility
Not reliably reproducible from observation; appears intermittent. The affected turn followed a chain of
read → read → update_plantool calls and produced a single plain assistant-text block as final output. No special characters in the text (plain Portuguese prose). Same session continued through ~30 minutes of multi-agent spawn work afterward without recurrence.If a maintainer wants, I can sanitize and attach the full session jsonl + trajectory file for the affected turn.
Workaround
End-user retry with any short follow-up message; the session continues cleanly. No
/newrequired, no session corruption.Why this looks orthogonal to the existing app-server stall issues
Issues such as #84076 describe Codex app-server stalls where
turn/completednever arrives andrecovery=none. Those are failure-path: the model run did not complete. In this casemodel.completedfires cleanly withaborted=falseand a non-emptyassistantTexts— the runtime considers the run successful, but the dispatch/admission layer downstream ofsession.endedemits the canned text anyway. The likely fix surface is the auto-reply admission gate after a successful run, not the Codex app-server signaling path.Happy to provide further detail or test patches.