Summary
When contextPruning (cache-ttl mode, auto-applied for Anthropic API key auth) is active, compaction uses the pruned token count reported by Anthropic's API rather than the actual session store size. This means compaction never triggers while Anthropic models are in use, and the session store grows unboundedly. When a non-Anthropic model (OpenRouter, Synthetic/HuggingFace, DeepSeek, Moonshot, etc.) is used on the same session, it receives the full unpruned history — 854K tokens in the observed case.
Environment
- OpenClaw v2026.2.6-3
- Node.js 22.22.0
- Ubuntu (Tailscale remote host)
Config
{
"agents.defaults.contextTokens": 128000,
"agents.defaults.compaction.mode": "default",
"agents.defaults.compaction.reserveTokensFloor": 15000
}
Anthropic models have cacheControlTtl: "1h" — contextPruning auto-applies cache-ttl mode.
Reproduction
- Start a Telegram session with an Anthropic model (e.g., Haiku) as default
- Have an extended conversation (76 runs over ~10 hours in our case)
- Switch to a non-Anthropic model mid-session (e.g., Synthetic
hf:Qwen/Qwen3-235B-A22B-Instruct-2507)
- Observe the non-Anthropic model receives the full unpruned session history (854K tokens)
Evidence (session logs, Feb 8 2026)
Session: 993cfc33-4c4a-4b15-99b5-ba82be583fe5 (Telegram DM, main agent)
76 runs total:
| Provider |
Model |
Runs |
| Anthropic |
claude-haiku-4-5-20251001 |
54 |
| DeepSeek |
deepseek-chat |
10 |
| Anthropic |
claude-sonnet-4-5-20250929 |
5 |
| Synthetic |
hf:moonshotai/Kimi-K2.5 |
4 |
| Synthetic |
hf:Qwen/Qwen3-235B-A22B-Instruct-2507 |
2 |
| OpenRouter |
moonshotai/kimi-k2.5 |
1 |
Compaction events: Exactly 1 — at 07:44 UTC, triggered by a DeepSeek heartbeat run (no pruning → saw real context size → threshold exceeded).
After that single compaction: Session grew from 07:44 to 15:58 (~8 hours, ~60 Anthropic runs) with zero additional compactions. contextPruning kept Anthropic API calls under 128K, but the session store grew to 854K tokens.
Cost impact of the final non-Anthropic runs:
| Model |
Input tokens |
Output tokens |
Cost |
| Qwen3-235B |
854,435 |
613 |
$0.19 |
| Kimi-K2.5 |
853,851 |
1,378 |
$0.47 |
$0.66 wasted on two API calls that sent the entire session history for minimal output.
Root Cause
_checkCompaction() uses the token usage reported by the last assistant message's API response to determine if the threshold is exceeded (as documented in #9282). When Anthropic models are active:
- contextPruning drops old messages before sending to Anthropic API
- Anthropic reports low token usage (pruned view)
- Compaction sees reported usage <
contextTokens (128K) → does not trigger
- Session store continues growing unbounded
- Non-Anthropic providers receive the full unpruned store
The single compaction that DID fire was on a DeepSeek run — DeepSeek has no cache-ttl pruning, so it reported the real context size.
Expected Behavior
Compaction should trigger based on the actual session store size (unpruned), not the pruned token count reported by Anthropic. All providers on the same session should see a context capped at contextTokens.
Suggested Fix
In _checkCompaction(), use the raw session entry count or a provider-agnostic token estimate (e.g., from the session JSONL) rather than the last API response's usage.input_tokens. This ensures compaction triggers regardless of which provider's pruning was last applied.
Alternatively, apply the same pruning/truncation to non-Anthropic providers before sending, so all providers see a consistent context window.
Related Issues
Summary
When
contextPruning(cache-ttlmode, auto-applied for Anthropic API key auth) is active, compaction uses the pruned token count reported by Anthropic's API rather than the actual session store size. This means compaction never triggers while Anthropic models are in use, and the session store grows unboundedly. When a non-Anthropic model (OpenRouter, Synthetic/HuggingFace, DeepSeek, Moonshot, etc.) is used on the same session, it receives the full unpruned history — 854K tokens in the observed case.Environment
Config
{ "agents.defaults.contextTokens": 128000, "agents.defaults.compaction.mode": "default", "agents.defaults.compaction.reserveTokensFloor": 15000 }Anthropic models have
cacheControlTtl: "1h"— contextPruning auto-appliescache-ttlmode.Reproduction
hf:Qwen/Qwen3-235B-A22B-Instruct-2507)Evidence (session logs, Feb 8 2026)
Session:
993cfc33-4c4a-4b15-99b5-ba82be583fe5(Telegram DM, main agent)76 runs total:
Compaction events: Exactly 1 — at 07:44 UTC, triggered by a DeepSeek heartbeat run (no pruning → saw real context size → threshold exceeded).
After that single compaction: Session grew from 07:44 to 15:58 (~8 hours, ~60 Anthropic runs) with zero additional compactions. contextPruning kept Anthropic API calls under 128K, but the session store grew to 854K tokens.
Cost impact of the final non-Anthropic runs:
$0.66 wasted on two API calls that sent the entire session history for minimal output.
Root Cause
_checkCompaction()uses the token usage reported by the last assistant message's API response to determine if the threshold is exceeded (as documented in #9282). When Anthropic models are active:contextTokens(128K) → does not triggerThe single compaction that DID fire was on a DeepSeek run — DeepSeek has no cache-ttl pruning, so it reported the real context size.
Expected Behavior
Compaction should trigger based on the actual session store size (unpruned), not the pruned token count reported by Anthropic. All providers on the same session should see a context capped at
contextTokens.Suggested Fix
In
_checkCompaction(), use the raw session entry count or a provider-agnostic token estimate (e.g., from the session JSONL) rather than the last API response'susage.input_tokens. This ensures compaction triggers regardless of which provider's pruning was last applied.Alternatively, apply the same pruning/truncation to non-Anthropic providers before sending, so all providers see a consistent context window.
Related Issues
cache-ttlcustom entries bypass compaction guard (same interaction surface)