Skip to content

Bug: contextPruning (cache-ttl) masks session size from compaction, causing unbounded growth for non-Anthropic models #11971

@reverendrewind

Description

@reverendrewind

Summary

When contextPruning (cache-ttl mode, auto-applied for Anthropic API key auth) is active, compaction uses the pruned token count reported by Anthropic's API rather than the actual session store size. This means compaction never triggers while Anthropic models are in use, and the session store grows unboundedly. When a non-Anthropic model (OpenRouter, Synthetic/HuggingFace, DeepSeek, Moonshot, etc.) is used on the same session, it receives the full unpruned history — 854K tokens in the observed case.

Environment

  • OpenClaw v2026.2.6-3
  • Node.js 22.22.0
  • Ubuntu (Tailscale remote host)

Config

{
  "agents.defaults.contextTokens": 128000,
  "agents.defaults.compaction.mode": "default",
  "agents.defaults.compaction.reserveTokensFloor": 15000
}

Anthropic models have cacheControlTtl: "1h" — contextPruning auto-applies cache-ttl mode.

Reproduction

  1. Start a Telegram session with an Anthropic model (e.g., Haiku) as default
  2. Have an extended conversation (76 runs over ~10 hours in our case)
  3. Switch to a non-Anthropic model mid-session (e.g., Synthetic hf:Qwen/Qwen3-235B-A22B-Instruct-2507)
  4. Observe the non-Anthropic model receives the full unpruned session history (854K tokens)

Evidence (session logs, Feb 8 2026)

Session: 993cfc33-4c4a-4b15-99b5-ba82be583fe5 (Telegram DM, main agent)

76 runs total:

Provider Model Runs
Anthropic claude-haiku-4-5-20251001 54
DeepSeek deepseek-chat 10
Anthropic claude-sonnet-4-5-20250929 5
Synthetic hf:moonshotai/Kimi-K2.5 4
Synthetic hf:Qwen/Qwen3-235B-A22B-Instruct-2507 2
OpenRouter moonshotai/kimi-k2.5 1

Compaction events: Exactly 1 — at 07:44 UTC, triggered by a DeepSeek heartbeat run (no pruning → saw real context size → threshold exceeded).

After that single compaction: Session grew from 07:44 to 15:58 (~8 hours, ~60 Anthropic runs) with zero additional compactions. contextPruning kept Anthropic API calls under 128K, but the session store grew to 854K tokens.

Cost impact of the final non-Anthropic runs:

Model Input tokens Output tokens Cost
Qwen3-235B 854,435 613 $0.19
Kimi-K2.5 853,851 1,378 $0.47

$0.66 wasted on two API calls that sent the entire session history for minimal output.

Root Cause

_checkCompaction() uses the token usage reported by the last assistant message's API response to determine if the threshold is exceeded (as documented in #9282). When Anthropic models are active:

  1. contextPruning drops old messages before sending to Anthropic API
  2. Anthropic reports low token usage (pruned view)
  3. Compaction sees reported usage < contextTokens (128K) → does not trigger
  4. Session store continues growing unbounded
  5. Non-Anthropic providers receive the full unpruned store

The single compaction that DID fire was on a DeepSeek run — DeepSeek has no cache-ttl pruning, so it reported the real context size.

Expected Behavior

Compaction should trigger based on the actual session store size (unpruned), not the pruned token count reported by Anthropic. All providers on the same session should see a context capped at contextTokens.

Suggested Fix

In _checkCompaction(), use the raw session entry count or a provider-agnostic token estimate (e.g., from the session JSONL) rather than the last API response's usage.input_tokens. This ensures compaction triggers regardless of which provider's pruning was last applied.

Alternatively, apply the same pruning/truncation to non-Anthropic providers before sending, so all providers see a consistent context window.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions