Skip to content

Auto-compaction preemptive check: no debug logging when route='fits'; cache-ttl pruning may mask estimate for large-context models #68609

@matthewcarano

Description

@matthewcarano

Environment

  • OpenClaw version: 2026.4.15 (also reproduced on 2026.4.11)
  • Platform: macOS (Mac Studio, native install)
  • Auth mode: Anthropic OAuth (mode: "token")
  • Main agent model: anthropic/claude-opus-4-6
  • Subagent / heartbeat model: anthropic/claude-sonnet-4-5
  • Compaction model: anthropic/claude-sonnet-4-5
  • Interface: Discord (channels/threads as sessions)

Relevant config

agents.defaults.contextTokens: 150000   (also tested at 170000, 200000)
compaction.mode: "default"
compaction.model: "anthropic/claude-sonnet-4-5"
compaction.reserveTokens: 50000
compaction.reserveTokensFloor: 30000
compaction.memoryFlush.softThresholdTokens: 15000
contextPruning.mode: "cache-ttl"   (aggressive, 30min TTL on tool results)

Summary

The preemptive auto-compaction check (shouldPreemptivelyCompactBeforePrompt) exhibited two issues with main-agent Opus sessions:

  1. At the default 200K budget, it never triggered preemptive compaction. Opus sessions grew to 167K+ tokens with zero compactions, requiring manual intervention or hitting actual context overflow (timeout-compaction).
  2. After lowering contextTokens to 150K, preemptive compaction eventually fired — confirming the mechanism works but the threshold interaction with cache-ttl pruning and large bootstrap loads creates a window where the estimate stays artificially low.

A critical observability gap compounds the issue: shouldPreemptivelyCompactBeforePrompt emits no log when route === "fits", making it impossible to distinguish "precheck ran and decided it fits" from "precheck never ran."

Observed behavior

  • Opus main-agent session climbed to 167K+ tokens (budget 200K) with zero compactions. Manual compaction required.
  • Log audit of gateway.log (full history): prior to the fix, every auto-compaction event was for Sonnet. Every Opus compaction event was timeout-compaction (overflow recovery).
  • On Apr 13, an Opus session hit context overflow and threw 15 consecutive "prompt too large" errors in rapid succession without recovery.
  • After lowering agents.defaults.contextTokens to 150K and starting a fresh session: auto-compaction eventually fired at ~10:33 EDT on Apr 18, confirmed in gateway.log: auto-compaction succeeded for anthropic/claude-opus-4-6; retrying prompt
  • The session /status initially showed "Compactions: 1" which was a carried-forward artifact from a prior session summary, not a new event. Real compaction count reached 2 after the auto-compaction fired.

Expected behavior

Preemptive compaction check should fire reliably regardless of model, and the decision path should be observable in logs.

Source trace / hypothesis

Traced the call site in pi-embedded-runner:

  • Preemptive check invoked at approx line 6654/6655
  • Default context engine is LegacyContextEngine (no ownsCompaction property) → bypass does NOT apply
  • contextTokenBudget defaults to 2e5 (200K) if not explicitly passed
  • shouldPreemptivelyCompactBeforePrompt uses estimatePrePromptTokens (character-based: estimatedChars / 4) compared against contextTokenBudget - reserveTokens

Leading hypothesis: contextPruning.mode: "cache-ttl" aggressively removes tool-result content from messages before estimatePrePromptTokens runs. The character-based estimate sees the pruned message set and stays well below the budget threshold, so the preemptive check returns route: "fits" indefinitely at the default 200K budget. At the reduced 150K budget, the margin is tight enough that the estimate eventually catches up.

This explains why Sonnet sessions compact correctly (different token accumulation pattern, more frequent tool calls, hit threshold between prune cycles) while Opus sessions at 200K do not (larger headroom, pruning consistently masks the estimate).

Pruning-disabled test pending to confirm this hypothesis.

Observability gap

The [context-overflow-precheck] log string is emitted ONLY when shouldCompact === true or when truncation routes fire. There is no log when route === "fits" — meaning the decision to NOT compact is completely silent. This made the bug extremely difficult to diagnose.

Request: Add a debug-level log inside shouldPreemptivelyCompactBeforePrompt (or at the call site) showing estimatedPromptTokens, promptBudgetBeforeReserve, overflowTokens, and route on every evaluation. This would make token budget issues trivially diagnosable.

Additional observations

Minimal reproduction

  1. Main agent model set to anthropic/claude-opus-4-6
  2. contextPruning.mode: "cache-ttl" enabled (30min TTL)
  3. Large bootstrap files (AGENTS.md, MEMORY.md etc.) totaling ~70K tokens at session start
  4. contextTokens at default (200K) or high value
  5. Run a long session with periodic tool use; let context accumulate
  6. Observe: no auto-compaction log entry at default budget; session requires manual compaction or hits overflow

Willing to provide

  • Config (redacted)
  • gateway.log excerpts showing Sonnet auto-compactions and Opus timeout-compactions
  • Source-trace notes

Metadata

Metadata

Assignees

Labels

P2Normal backlog priority with limited blast radius.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions