Skip to content

Codex missing-terminal fallback leaks into Discord channel on 2026.5.27 despite #87079 #87725

@PollyBot13

Description

@PollyBot13

Summary

On 2026.5.27, a Discord channel session using the bundled Codex runtime posted the internal missing-terminal fallback text directly into the user-visible channel:

Codex stopped before confirming the turn was complete. Some work may already have been performed; verify the current state before retrying.

This is not just a turn failure. The user-facing problem is that an internal lifecycle/recovery message was delivered as a normal channel reply, which reads like an assistant response and creates confusing noise in the conversation.

Environment

Observed behavior

At least two archived Discord channel messages on 2026-05-28 contained the exact fallback text above. The latest local incident was around 2026-05-28T16:46:08Z.

The visible symptom was followed by a user complaint because the same class of fallback had appeared repeatedly in the channel.

Local follow-up showed this was a real lifecycle failure path, not intentional assistant content:

  • session-lifecycle-audit.py --since-hours 4 initially found one high-severity unrecovered tool-failure lifecycle finding in the implicated session window.
  • The finding pointed at an unrecovered tool/native lifecycle failure before the visible fallback. The concrete local evidence included a shell/tool failure (jq: ... null has no keys) during the affected turn.
  • A later audit window returned clean after the incident, so this report is about the already-observed channel leak rather than an active stuck session.

Why this is probably not fixed by #87079

#87079 appears to be present in the installed 2026.5.27 Codex runtime. Local installed code contains the expected rawResponseItemCompletedWithNoActiveItems path and arms the completion idle watchdog for the narrow "rawResponseItem/completed but no turn/completed" stall.

That means #87079 may explain why the turn eventually produced fallback/timeout feedback instead of hanging indefinitely, but it does not prevent the user-visible Discord leak.

The current failure mode seems more like:

  1. Codex/app-server/tool lifecycle gets into a missing-terminal or unrecovered tool-failure state.
  2. buildCodexAppServerPromptTimeoutOutcome() creates a prompt timeout outcome with:
    • replayInvalid: true
    • livenessState: "abandoned"
    • side-effect warning copy when potential side effects are detected.
  3. Reply/admission/channel delivery treats that outcome as text suitable for the source Discord channel.
  4. The Discord channel receives the internal fallback as if it were a normal assistant reply.

Expected behavior

The system should still protect the user from unsafe automatic replay when side effects may have happened, but the channel UX should not leak the internal lifecycle fallback as ordinary assistant content.

Possible acceptable outcomes:

  • surface a channel-safe failure message that clearly says the previous assistant turn failed internally and was not a normal answer;
  • route the detailed side-effect/replay metadata to logs/control UI while keeping source-channel text concise;
  • mark the turn as failed/abandoned without emitting the current canned fallback into the user channel;
  • preserve the "do not auto-replay side-effecting work" behavior.

The important distinction is: keep the safety signal, but do not present Codex lifecycle machinery as chat content.

Related issues / PRs

Local code pointers from installed build

These are from the installed 2026.5.27 build rather than a source checkout:

  • Codex run-attempt build contains:
    • buildCodexAppServerPromptTimeoutOutcome(...)
    • replayInvalid: true
    • livenessState: "abandoned"
    • CODEX_APP_SERVER_MISSING_TERMINAL_EVENT_SIDE_EFFECT_USER_MESSAGE
    • rawResponseItemCompletedWithNoActiveItems
  • OpenClaw reply/admission build reads prompt timeout metadata including livenessState and replayInvalid.

Suggested fix direction

Add a regression test at the source-channel/reply-admission boundary for a Codex app-server missing-terminal timeout outcome with side effects:

  • input: prompt timeout outcome with replayInvalid: true, livenessState: "abandoned", and side-effect warning copy;
  • channel: Discord/source channel;
  • expected: the raw internal fallback string is not emitted as normal assistant text;
  • expected: replay remains invalid / not automatically retried;
  • expected: enough diagnostic metadata remains available for logs/control UI.

Then adjust the delivery/admission layer or Codex timeout outcome handling so user channels receive intentional channel-safe copy rather than the current internal fallback.

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions