[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction
Summary
OpenClaw's current context management relies on passive overflow-based compaction — it only triggers when the context window is exceeded, which interrupts the running task and restarts it from scratch. This creates a poor user experience and can cause infinite loops when compaction repeatedly interrupts the same task.
This feature request proposes adding runtime token budget awareness inspired by Hermes Agent's ContextEngine, enabling proactive, threshold-based context compaction before overflow occurs.
Current Behavior
- Token tracking depends on provider response — If the provider (e.g., omlx local MLX server) doesn't return
usage fields, OpenClaw shows ?/131k (?%) with zero compaction tracking.
- Compaction triggers on overflow — Only fires when the context window is already full, interrupting the running agent task.
- No budget awareness — The agent has no visibility into how much context budget remains.
- No proactive compression — Cannot compress before hitting the limit.
Proposed Behavior
1. Runtime Token Budget Tracking
A lightweight context budget tracker that runs at the framework level:
class ContextBudget {
// Estimated tokens used in current session
estimatedTokens: number = 0;
// Configurable budget threshold (default: 80% of contextWindow)
thresholdTokens: number = 0;
// Context window size from provider config
contextWindow: number = 0;
// Update from API response usage (when available)
updateFromResponse(promptTokens: number, completionTokens: number): void;
// Estimate tokens when provider doesn't return usage
estimateFromMessages(messages: Message[]): number;
// Check if proactive compression should trigger
shouldCompress(): boolean;
// Get budget status for diagnostics
getStatus(): { used: number; budget: number; remaining: number; pct: number };
}
Key design:
- Uses provider
usage when available (same as current)
- Falls back to character-to-token estimation (roughly
chars / 4) when provider doesn't return usage
- Tracks both prompt tokens (system + history) and completion tokens
- Configurable threshold (default 80% of
contextWindow)
2. Proactive Compaction
Trigger compaction before overflow, not after:
Context usage: 0% ─────── 80% ─────── 100%
↑ ↑ ↑
shouldCompress threshold overflow
(proactive) (warn) (current behavior)
- At 80%: Log warning, optionally notify agent
- At 85-95%: Trigger proactive compaction (LLM summarization)
- At 100%: Force compaction (current fallback, but now as last resort)
3. Configurable Compaction Strategy
4. Agent Visibility
The agent should have access to budget status:
/status output: 📚 Context: 45,231/131,072 (34%) — even when provider doesn't return usage, use estimation
- System hint: When approaching threshold, optionally inject a subtle hint into the system prompt
- Agent API: Expose
contextBudget status through session status
5. Token Estimation Fallback
For providers that don't return usage (like omlx local MLX):
// Rough estimation: ~4 chars per token for mixed English/Chinese
const CHARS_PER_TOKEN = 4;
function estimateTokens(content: string): number {
return Math.ceil(content.length / CHARS_PER_TOKEN);
}
// More accurate: count actual tokens via a lightweight tokenizer
// (optional, for providers that support it)
Implementation Approach
Phase 1: Budget Tracker (Core)
- Add
ContextBudget class to the gateway
- Integrate with existing
contextTokens config
- Use provider
usage when available, fall back to estimation
- Update
/status display to show actual/estimated tokens
Phase 2: Proactive Compaction
- Add compaction trigger in the message processing pipeline
- Implement LLM-based summarization compaction (similar to Hermes)
- Head/tail protection to preserve important context
- Configurable thresholds
Phase 3: Agent Integration
- Expose budget status to the agent
- Optional system prompt hints when approaching threshold
- Agent can trigger manual compaction via tool call
Benefits
- No more interrupting running tasks — Compaction happens before overflow
- Better UX — Dashboard shows real token usage, not
?/131k
- Works with any provider — Estimation fallback for providers without usage reporting
- Configurable — Users can tune thresholds and strategies
- Prevents infinite loops — Proactive compression avoids the overflow → interrupt → restart → overflow loop
Related Issues
Alternatives Considered
- Agent-level estimation only — Weaker because the agent can't trigger framework-level compaction
- Rely on provider usage only — Doesn't work for local/self-hosted providers
- Fixed message count compaction — Less precise than token-based
Notes
This is inspired by Hermes Agent's ContextEngine which has been proven in production. The key insight is that token budget awareness should be a framework-level concern, not an agent-level concern. The agent should be able to see the budget status, but the framework should manage it.
[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction
Summary
OpenClaw's current context management relies on passive overflow-based compaction — it only triggers when the context window is exceeded, which interrupts the running task and restarts it from scratch. This creates a poor user experience and can cause infinite loops when compaction repeatedly interrupts the same task.
This feature request proposes adding runtime token budget awareness inspired by Hermes Agent's
ContextEngine, enabling proactive, threshold-based context compaction before overflow occurs.Current Behavior
usagefields, OpenClaw shows?/131k (?%)with zero compaction tracking.Proposed Behavior
1. Runtime Token Budget Tracking
A lightweight context budget tracker that runs at the framework level:
Key design:
usagewhen available (same as current)chars / 4) when provider doesn't return usagecontextWindow)2. Proactive Compaction
Trigger compaction before overflow, not after:
3. Configurable Compaction Strategy
{ "agents": { "defaults": { "contextTokens": 131072, "compaction": { "enabled": true, "strategy": "summarize", // "summarize" | "truncate" | "discard" "thresholdPct": 80, // trigger at 80% of contextWindow "headProtection": 15, // always keep last N messages "tailProtection": 5, // always keep first N messages "summaryPrefix": "[CONTEXT SUMMARY — reference only]" } } } }4. Agent Visibility
The agent should have access to budget status:
/statusoutput:📚 Context: 45,231/131,072 (34%)— even when provider doesn't return usage, use estimationcontextBudgetstatus through session status5. Token Estimation Fallback
For providers that don't return
usage(like omlx local MLX):Implementation Approach
Phase 1: Budget Tracker (Core)
ContextBudgetclass to the gatewaycontextTokensconfigusagewhen available, fall back to estimation/statusdisplay to show actual/estimated tokensPhase 2: Proactive Compaction
Phase 3: Agent Integration
Benefits
?/131kRelated Issues
Alternatives Considered
Notes
This is inspired by Hermes Agent's
ContextEnginewhich has been proven in production. The key insight is that token budget awareness should be a framework-level concern, not an agent-level concern. The agent should be able to see the budget status, but the framework should manage it.