Summary
In a tool-heavy Telegram direct session, an overflow recovery path waited for the full auto-compaction safety timeout (~900s) before falling back to tool-result truncation. The subsequent truncation completed almost immediately and allowed the prompt to continue, so the user-visible latency appears avoidable.
This looks related to existing compaction/context issues such as #53008, #64962, #78562, and #45503, but the specific recovery-order problem is:
when a context overflow is heavily driven by accumulated tool results, OpenClaw should truncate/prune tool results before waiting the full LLM compaction timeout, or at least race/escalate to truncation much sooner.
What happened
A Telegram direct session was asked to make a small configuration change (switch default model to openai-codex/gpt-5.5). The actual config change/hot reload was not the slow part. The delay came from context overflow recovery in the active main session.
Observed log timeline (Asia/Seoul):
00:06:14 [context-overflow-diag]
sessionKey=agent:main:telegram:direct:<redacted>
provider=openai-codex/gpt-5.4
source=assistantError
messages=336
error=Context overflow: estimated context size exceeds safe threshold during tool loop.
00:06:14 context overflow detected (attempt 1/3); attempting auto-compaction for openai-codex/gpt-5.4
00:21:37 [context-overflow-diag]
provider=openai-codex/gpt-5.4
messages=343
error=Context overflow: estimated context size exceeds safe threshold during tool loop.
00:21:37 context overflow detected (attempt 1/3); attempting auto-compaction for openai-codex/gpt-5.4
00:36:50 [compaction-diag] end
trigger=overflow
provider=openai-codex/gpt-5.4
attempt=1
maxAttempts=3
outcome=failed
reason=timeout
durationMs=912840
00:36:50 auto-compaction failed for openai-codex/gpt-5.4: Compaction timed out
00:36:50 [context-overflow-recovery]
Attempting tool result truncation for openai-codex/gpt-5.4 (contextWindow=272000 tokens)
00:36:50 [tool-result-truncation]
Truncated 281 tool result(s) in session
contextWindow=272000
maxChars=16000
aggregateBudgetChars=16000
oversized=0
aggregate=281
00:36:50 [context-overflow-recovery]
Truncated 281 tool result(s); retrying prompt
The relevant point is that tool-result truncation took milliseconds after the 900s compaction timeout, and the session had 281 tool results. That suggests the overflow was at least substantially tool-result-driven.
Expected behavior
For overflow recovery in tool-heavy sessions, OpenClaw should avoid waiting the full auto-compaction timeout before applying a cheap deterministic reduction.
Possible behavior:
- Detect a tool-result-heavy context (
toolResultChars, tool result count, or similar heuristic).
- Run tool-result truncation/pruning before LLM compaction, or in parallel/race with compaction.
- If truncation reduces context enough, retry immediately.
- Only fall back to LLM compaction when truncation is insufficient or when conversation text, not tool output, is the dominant contributor.
Actual behavior
The recovery path waited for LLM auto-compaction to hit the safety timeout first (durationMs=912840, default 900s-ish), then performed tool-result truncation.
This caused ~15 minutes of avoidable user-visible latency in Telegram.
Impact
- Telegram/direct session appears hung for ~15 minutes.
- Small tasks can be delayed by context recovery even when the final successful recovery operation is deterministic and fast.
- A long-lived assistant session with many tool calls can repeatedly hit this pattern.
- It also consumes model quota/cost on compaction attempts before trying a local truncation strategy.
Environment
- OpenClaw runtime: embedded main agent
- Channel: Telegram direct
- Model at overflow time:
openai-codex/gpt-5.4
- Configured/target model after change:
openai-codex/gpt-5.5
- Context window logged:
272000 tokens
- Session had 336-343 messages at overflow
- Tool-result truncation affected 281 tool result(s)
Related issues
Suggested fix
A conservative fix could be:
- Before starting overflow-triggered LLM compaction, compute a cheap summary of tool result contribution.
- If tool results exceed a threshold (count/chars/ratio), run the existing tool-result truncation first.
- Retry the prompt after truncation.
- If still overflowing, then run LLM compaction.
Alternatively, start auto-compaction but set a much shorter escalation timer for tool-heavy sessions (for example 30-60s), then truncate tool results if compaction has not completed.
Summary
In a tool-heavy Telegram direct session, an overflow recovery path waited for the full auto-compaction safety timeout (~900s) before falling back to tool-result truncation. The subsequent truncation completed almost immediately and allowed the prompt to continue, so the user-visible latency appears avoidable.
This looks related to existing compaction/context issues such as #53008, #64962, #78562, and #45503, but the specific recovery-order problem is:
What happened
A Telegram direct session was asked to make a small configuration change (switch default model to
openai-codex/gpt-5.5). The actual config change/hot reload was not the slow part. The delay came from context overflow recovery in the active main session.Observed log timeline (Asia/Seoul):
The relevant point is that tool-result truncation took milliseconds after the 900s compaction timeout, and the session had 281 tool results. That suggests the overflow was at least substantially tool-result-driven.
Expected behavior
For overflow recovery in tool-heavy sessions, OpenClaw should avoid waiting the full auto-compaction timeout before applying a cheap deterministic reduction.
Possible behavior:
toolResultChars, tool result count, or similar heuristic).Actual behavior
The recovery path waited for LLM auto-compaction to hit the safety timeout first (
durationMs=912840, default 900s-ish), then performed tool-result truncation.This caused ~15 minutes of avoidable user-visible latency in Telegram.
Impact
Environment
openai-codex/gpt-5.4openai-codex/gpt-5.5272000 tokensRelated issues
Suggested fix
A conservative fix could be:
Alternatively, start auto-compaction but set a much shorter escalation timer for tool-heavy sessions (for example 30-60s), then truncate tool results if compaction has not completed.