Skip to content

Telegram direct session can become unrecoverable after context overflow and auto-compaction hangs #70744

@asleep75

Description

@asleep75

Summary

A long-lived Telegram direct chat became permanently non-responsive after hitting context overflow. The gateway and Telegram provider both restarted normally, but the same Telegram direct session immediately re-entered auto-compaction and never recovered enough to send a reply. Repeated gateway restarts did not fix it. The only effective recovery was to back up and quarantine/reset the session JSONL so a fresh Telegram session could be created.

Environment

  • OpenClaw: 2026.4.14
  • Host OS: Ubuntu 24.04
  • Runtime: Node 22.22.2
  • Model in affected session: openai-codex/gpt-5.4
  • Channel: Telegram direct chat

Affected session

  • sessionKey: agent:main:telegram:direct:8365774449
  • session file: ~/.openclaw/agents/main/sessions/a6154fac-955a-40ca-b04d-90ff98dd9f20.jsonl
  • size at failure: about 5.1 MB
  • message count shown in logs around failure: 239-241

Symptoms

  • Telegram bot stopped replying in that direct chat
  • Gateway restart succeeded
  • Telegram provider startup succeeded
  • On each fresh start, the same session immediately hit context overflow again
  • No reply was ever emitted back to Telegram
  • The typing indicator eventually expired, but the session remained wedged

Relevant log pattern

Observed repeatedly after restart:

[telegram] [default] starting provider (@Cooter_the_bot)
[agent/embedded] [context-overflow-diag] sessionKey=agent:main:telegram:direct:8365774449 provider=openai-codex/gpt-5.4 source=assistantError messages=239/240/241 sessionFile=/home/.../a6154fac-955a-40ca-b04d-90ff98dd9f20.jsonl ... error=Context overflow: estimated context size exceeds safe threshold during tool loop.
[agent/embedded] context overflow detected (attempt 1/3); attempting auto-compaction for openai-codex/gpt-5.4
typing TTL reached (2m); stopping typing indicator

There was no visible recovery message such as auto-compaction succeeded, and the chat remained non-responsive.

What did NOT fix it

  • Restarting the gateway
  • Letting the provider reconnect
  • Sending new Telegram messages into the same chat

What DID fix it

  1. Back up the affected session JSONL
  2. Move/quarantine/reset the stuck session file
  3. Restart the gateway
  4. Let Telegram create/use a fresh session

After that, Telegram replies resumed immediately.

Expected behavior

If a session overflows and auto-compaction cannot recover it, OpenClaw should fail more gracefully. Examples:

  • automatically fork/reset the session after repeated compaction failure
  • emit a visible error or fallback reply to the user instead of hanging indefinitely
  • avoid reloading the same poisoned session into an endless overflow/compaction loop on startup

Additional notes

  • This same chat appears to have overflowed previously and recovered once, so the issue may be tied to transcript growth plus a compaction edge case rather than a one-off Telegram outage.
  • There were also occasional Telegram network fallback warnings (ETIMEDOUT, ENETUNREACH, UND_ERR_SOCKET), but those did not appear to be the root cause here because the provider started successfully and the failure reproduced specifically on the same session transcript.

If helpful, I can provide a sanitized copy of the failing session transcript or more exact timestamps from logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions