Skip to content

Regression: preflight compaction still surfaces missing Codex thread failure after #86602 #87736

@pfrederiksen

Description

@pfrederiksen

Summary

The missing Codex thread preflight-compaction failure from #86211 reproduced again on OpenClaw 2026.5.27, after #86211 was closed as fixed by #86602.

A Telegram group inbound was accepted, but dispatch failed before a normal assistant turn. Telegram surfaced the generic user-facing fallback:

Something went wrong while processing your request. Please try again.

No private chat ids, message ids, session ids, bot handles, or raw Codex thread ids are included here.

Environment

Expected Behavior

When a Codex app-server preflight compaction thread is stale/missing, OpenClaw should classify it as recoverable missing/stale binding state and continue into the existing recovery/fresh-thread path. Telegram users should not see the generic failure for this condition.

Actual Behavior

Gateway logs showed dispatch failing before a normal assistant turn. First, preflight compaction timed out waiting on the Codex app-server compaction thread; immediately afterward, a subsequent inbound failed with raw thread not found for the same redacted Codex thread id.

Redacted Runtime Log Excerpt

This is the relevant sequence from the gateway journal. Private chat id, message ids, host, and Codex thread id are redacted.

2026-05-28T17:30:29.859Z [diagnostic] message dispatch completed:
  channel=telegram
  sessionId=unknown
  sessionKey=agent:main:telegram:group:<redacted-chat-id>
  source=replyResolver
  outcome=error
  duration=305058ms
  error="Error: Preflight compaction required but failed: timed out waiting for codex app-server compaction for <codex-thread-id>"

2026-05-28T17:30:29.863Z [diagnostic] message processed:
  channel=telegram
  chatId=telegram:<redacted-chat-id>
  messageId=<redacted-message-id>
  sessionId=unknown
  sessionKey=agent:main:telegram:group:<redacted-chat-id>
  outcome=error
  duration=305137ms
  error="Error: Preflight compaction required but failed: timed out waiting for codex app-server compaction for <codex-thread-id>"

2026-05-28T17:30:29.868Z [telegram] dispatch failed:
  Error: Preflight compaction required but failed: timed out waiting for codex app-server compaction for <codex-thread-id>

2026-05-28T17:30:32.083Z [diagnostic] message dispatch completed:
  channel=telegram
  sessionId=unknown
  sessionKey=agent:main:telegram:group:<redacted-chat-id>
  source=replyResolver
  outcome=error
  duration=1021ms
  error="Error: Preflight compaction required but failed: thread not found: <codex-thread-id>"

2026-05-28T17:30:32.086Z [diagnostic] message processed:
  channel=telegram
  chatId=telegram:<redacted-chat-id>
  messageId=<redacted-message-id>
  sessionId=unknown
  sessionKey=agent:main:telegram:group:<redacted-chat-id>
  outcome=error
  duration=1092ms
  error="Error: Preflight compaction required but failed: thread not found: <codex-thread-id>"

2026-05-28T17:30:32.090Z [telegram] dispatch failed:
  Error: Preflight compaction required but failed: thread not found: <codex-thread-id>

Why this still looks related to #86211 / #86602

#86211 and #86602 are the direct lineage for missing/stale Codex thread recovery during preflight compaction.

Current source recovers structured missing/stale binding results such as:

  • failure.reason=stale_thread_binding
  • failure.reason=missing_thread_binding

This recurrence suggests there is still a path where a raw thread not found compaction result reaches the preflight failure throw without being classified as recoverable, or where recovery happens only after one or more user-visible failed dispatches.

The raw thread not found appears immediately after a timed out waiting for codex app-server compaction failure for the same redacted Codex thread id. That suggests the remaining producer may be the timeout/retry boundary around Codex app-server compaction, not the direct Codex app-server thread not found path already normalized by #86602.

After-Patch Local Proof for PR #87738

I ran the patched runtime helper locally against a redacted Telegram-group session entry and a raw compaction result shaped as:

{ ok: false, compacted: false, reason: "thread not found: <codex-thread-id>" }

Observed output:

{
  "proof": "raw thread-not-found preflight compaction is recoverable",
  "returnedSameSessionEntry": true,
  "compactCalls": 1,
  "incrementCalls": 0,
  "threw": false
}

What is still not proven: a live Telegram run against a production gateway with PR #87738 installed.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions