Skip to content

Overflow recovery should truncate tool results before waiting full auto-compaction timeout #81182

@LLagoon3

Description

@LLagoon3

Summary

In a tool-heavy Telegram direct session, an overflow recovery path waited for the full auto-compaction safety timeout (~900s) before falling back to tool-result truncation. The subsequent truncation completed almost immediately and allowed the prompt to continue, so the user-visible latency appears avoidable.

This looks related to existing compaction/context issues such as #53008, #64962, #78562, and #45503, but the specific recovery-order problem is:

when a context overflow is heavily driven by accumulated tool results, OpenClaw should truncate/prune tool results before waiting the full LLM compaction timeout, or at least race/escalate to truncation much sooner.

What happened

A Telegram direct session was asked to make a small configuration change (switch default model to openai-codex/gpt-5.5). The actual config change/hot reload was not the slow part. The delay came from context overflow recovery in the active main session.

Observed log timeline (Asia/Seoul):

00:06:14 [context-overflow-diag]
  sessionKey=agent:main:telegram:direct:<redacted>
  provider=openai-codex/gpt-5.4
  source=assistantError
  messages=336
  error=Context overflow: estimated context size exceeds safe threshold during tool loop.

00:06:14 context overflow detected (attempt 1/3); attempting auto-compaction for openai-codex/gpt-5.4

00:21:37 [context-overflow-diag]
  provider=openai-codex/gpt-5.4
  messages=343
  error=Context overflow: estimated context size exceeds safe threshold during tool loop.

00:21:37 context overflow detected (attempt 1/3); attempting auto-compaction for openai-codex/gpt-5.4

00:36:50 [compaction-diag] end
  trigger=overflow
  provider=openai-codex/gpt-5.4
  attempt=1
  maxAttempts=3
  outcome=failed
  reason=timeout
  durationMs=912840

00:36:50 auto-compaction failed for openai-codex/gpt-5.4: Compaction timed out

00:36:50 [context-overflow-recovery]
  Attempting tool result truncation for openai-codex/gpt-5.4 (contextWindow=272000 tokens)

00:36:50 [tool-result-truncation]
  Truncated 281 tool result(s) in session
  contextWindow=272000
  maxChars=16000
  aggregateBudgetChars=16000
  oversized=0
  aggregate=281

00:36:50 [context-overflow-recovery]
  Truncated 281 tool result(s); retrying prompt

The relevant point is that tool-result truncation took milliseconds after the 900s compaction timeout, and the session had 281 tool results. That suggests the overflow was at least substantially tool-result-driven.

Expected behavior

For overflow recovery in tool-heavy sessions, OpenClaw should avoid waiting the full auto-compaction timeout before applying a cheap deterministic reduction.

Possible behavior:

  1. Detect a tool-result-heavy context (toolResultChars, tool result count, or similar heuristic).
  2. Run tool-result truncation/pruning before LLM compaction, or in parallel/race with compaction.
  3. If truncation reduces context enough, retry immediately.
  4. Only fall back to LLM compaction when truncation is insufficient or when conversation text, not tool output, is the dominant contributor.

Actual behavior

The recovery path waited for LLM auto-compaction to hit the safety timeout first (durationMs=912840, default 900s-ish), then performed tool-result truncation.

This caused ~15 minutes of avoidable user-visible latency in Telegram.

Impact

  • Telegram/direct session appears hung for ~15 minutes.
  • Small tasks can be delayed by context recovery even when the final successful recovery operation is deterministic and fast.
  • A long-lived assistant session with many tool calls can repeatedly hit this pattern.
  • It also consumes model quota/cost on compaction attempts before trying a local truncation strategy.

Environment

  • OpenClaw runtime: embedded main agent
  • Channel: Telegram direct
  • Model at overflow time: openai-codex/gpt-5.4
  • Configured/target model after change: openai-codex/gpt-5.5
  • Context window logged: 272000 tokens
  • Session had 336-343 messages at overflow
  • Tool-result truncation affected 281 tool result(s)

Related issues

Suggested fix

A conservative fix could be:

  • Before starting overflow-triggered LLM compaction, compute a cheap summary of tool result contribution.
  • If tool results exceed a threshold (count/chars/ratio), run the existing tool-result truncation first.
  • Retry the prompt after truncation.
  • If still overflowing, then run LLM compaction.

Alternatively, start auto-compaction but set a much shorter escalation timer for tool-heavy sessions (for example 30-60s), then truncate tool results if compaction has not completed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions