Summary
A long-lived Telegram direct chat became permanently non-responsive after hitting context overflow. The gateway and Telegram provider both restarted normally, but the same Telegram direct session immediately re-entered auto-compaction and never recovered enough to send a reply. Repeated gateway restarts did not fix it. The only effective recovery was to back up and quarantine/reset the session JSONL so a fresh Telegram session could be created.
Environment
- OpenClaw:
2026.4.14
- Host OS: Ubuntu 24.04
- Runtime: Node
22.22.2
- Model in affected session:
openai-codex/gpt-5.4
- Channel: Telegram direct chat
Affected session
- sessionKey:
agent:main:telegram:direct:8365774449
- session file:
~/.openclaw/agents/main/sessions/a6154fac-955a-40ca-b04d-90ff98dd9f20.jsonl
- size at failure: about
5.1 MB
- message count shown in logs around failure:
239-241
Symptoms
- Telegram bot stopped replying in that direct chat
- Gateway restart succeeded
- Telegram provider startup succeeded
- On each fresh start, the same session immediately hit context overflow again
- No reply was ever emitted back to Telegram
- The typing indicator eventually expired, but the session remained wedged
Relevant log pattern
Observed repeatedly after restart:
[telegram] [default] starting provider (@Cooter_the_bot)
[agent/embedded] [context-overflow-diag] sessionKey=agent:main:telegram:direct:8365774449 provider=openai-codex/gpt-5.4 source=assistantError messages=239/240/241 sessionFile=/home/.../a6154fac-955a-40ca-b04d-90ff98dd9f20.jsonl ... error=Context overflow: estimated context size exceeds safe threshold during tool loop.
[agent/embedded] context overflow detected (attempt 1/3); attempting auto-compaction for openai-codex/gpt-5.4
typing TTL reached (2m); stopping typing indicator
There was no visible recovery message such as auto-compaction succeeded, and the chat remained non-responsive.
What did NOT fix it
- Restarting the gateway
- Letting the provider reconnect
- Sending new Telegram messages into the same chat
What DID fix it
- Back up the affected session JSONL
- Move/quarantine/reset the stuck session file
- Restart the gateway
- Let Telegram create/use a fresh session
After that, Telegram replies resumed immediately.
Expected behavior
If a session overflows and auto-compaction cannot recover it, OpenClaw should fail more gracefully. Examples:
- automatically fork/reset the session after repeated compaction failure
- emit a visible error or fallback reply to the user instead of hanging indefinitely
- avoid reloading the same poisoned session into an endless overflow/compaction loop on startup
Additional notes
- This same chat appears to have overflowed previously and recovered once, so the issue may be tied to transcript growth plus a compaction edge case rather than a one-off Telegram outage.
- There were also occasional Telegram network fallback warnings (
ETIMEDOUT, ENETUNREACH, UND_ERR_SOCKET), but those did not appear to be the root cause here because the provider started successfully and the failure reproduced specifically on the same session transcript.
If helpful, I can provide a sanitized copy of the failing session transcript or more exact timestamps from logs.
Summary
A long-lived Telegram direct chat became permanently non-responsive after hitting context overflow. The gateway and Telegram provider both restarted normally, but the same Telegram direct session immediately re-entered auto-compaction and never recovered enough to send a reply. Repeated gateway restarts did not fix it. The only effective recovery was to back up and quarantine/reset the session JSONL so a fresh Telegram session could be created.
Environment
2026.4.1422.22.2openai-codex/gpt-5.4Affected session
agent:main:telegram:direct:8365774449~/.openclaw/agents/main/sessions/a6154fac-955a-40ca-b04d-90ff98dd9f20.jsonl5.1 MB239-241Symptoms
Relevant log pattern
Observed repeatedly after restart:
There was no visible recovery message such as
auto-compaction succeeded, and the chat remained non-responsive.What did NOT fix it
What DID fix it
After that, Telegram replies resumed immediately.
Expected behavior
If a session overflows and auto-compaction cannot recover it, OpenClaw should fail more gracefully. Examples:
Additional notes
ETIMEDOUT,ENETUNREACH,UND_ERR_SOCKET), but those did not appear to be the root cause here because the provider started successfully and the failure reproduced specifically on the same session transcript.If helpful, I can provide a sanitized copy of the failing session transcript or more exact timestamps from logs.