Skip to content

[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction #80594

@lizhi145738-eng

Description

@lizhi145738-eng

[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction

Summary

OpenClaw's current context management relies on passive overflow-based compaction — it only triggers when the context window is exceeded, which interrupts the running task and restarts it from scratch. This creates a poor user experience and can cause infinite loops when compaction repeatedly interrupts the same task.

This feature request proposes adding runtime token budget awareness inspired by Hermes Agent's ContextEngine, enabling proactive, threshold-based context compaction before overflow occurs.

Current Behavior

  1. Token tracking depends on provider response — If the provider (e.g., omlx local MLX server) doesn't return usage fields, OpenClaw shows ?/131k (?%) with zero compaction tracking.
  2. Compaction triggers on overflow — Only fires when the context window is already full, interrupting the running agent task.
  3. No budget awareness — The agent has no visibility into how much context budget remains.
  4. No proactive compression — Cannot compress before hitting the limit.

Proposed Behavior

1. Runtime Token Budget Tracking

A lightweight context budget tracker that runs at the framework level:

class ContextBudget {
  // Estimated tokens used in current session
  estimatedTokens: number = 0;
  
  // Configurable budget threshold (default: 80% of contextWindow)
  thresholdTokens: number = 0;
  
  // Context window size from provider config
  contextWindow: number = 0;
  
  // Update from API response usage (when available)
  updateFromResponse(promptTokens: number, completionTokens: number): void;
  
  // Estimate tokens when provider doesn't return usage
  estimateFromMessages(messages: Message[]): number;
  
  // Check if proactive compression should trigger
  shouldCompress(): boolean;
  
  // Get budget status for diagnostics
  getStatus(): { used: number; budget: number; remaining: number; pct: number };
}

Key design:

  • Uses provider usage when available (same as current)
  • Falls back to character-to-token estimation (roughly chars / 4) when provider doesn't return usage
  • Tracks both prompt tokens (system + history) and completion tokens
  • Configurable threshold (default 80% of contextWindow)

2. Proactive Compaction

Trigger compaction before overflow, not after:

Context usage:  0% ─────── 80% ─────── 100%
                 ↑              ↑           ↑
              shouldCompress   threshold   overflow
              (proactive)      (warn)      (current behavior)
  • At 80%: Log warning, optionally notify agent
  • At 85-95%: Trigger proactive compaction (LLM summarization)
  • At 100%: Force compaction (current fallback, but now as last resort)

3. Configurable Compaction Strategy

{
  "agents": {
    "defaults": {
      "contextTokens": 131072,
      "compaction": {
        "enabled": true,
        "strategy": "summarize", // "summarize" | "truncate" | "discard"
        "thresholdPct": 80,      // trigger at 80% of contextWindow
        "headProtection": 15,    // always keep last N messages
        "tailProtection": 5,     // always keep first N messages
        "summaryPrefix": "[CONTEXT SUMMARY — reference only]"
      }
    }
  }
}

4. Agent Visibility

The agent should have access to budget status:

  • /status output: 📚 Context: 45,231/131,072 (34%) — even when provider doesn't return usage, use estimation
  • System hint: When approaching threshold, optionally inject a subtle hint into the system prompt
  • Agent API: Expose contextBudget status through session status

5. Token Estimation Fallback

For providers that don't return usage (like omlx local MLX):

// Rough estimation: ~4 chars per token for mixed English/Chinese
const CHARS_PER_TOKEN = 4;

function estimateTokens(content: string): number {
  return Math.ceil(content.length / CHARS_PER_TOKEN);
}

// More accurate: count actual tokens via a lightweight tokenizer
// (optional, for providers that support it)

Implementation Approach

Phase 1: Budget Tracker (Core)

  • Add ContextBudget class to the gateway
  • Integrate with existing contextTokens config
  • Use provider usage when available, fall back to estimation
  • Update /status display to show actual/estimated tokens

Phase 2: Proactive Compaction

  • Add compaction trigger in the message processing pipeline
  • Implement LLM-based summarization compaction (similar to Hermes)
  • Head/tail protection to preserve important context
  • Configurable thresholds

Phase 3: Agent Integration

  • Expose budget status to the agent
  • Optional system prompt hints when approaching threshold
  • Agent can trigger manual compaction via tool call

Benefits

  1. No more interrupting running tasks — Compaction happens before overflow
  2. Better UX — Dashboard shows real token usage, not ?/131k
  3. Works with any provider — Estimation fallback for providers without usage reporting
  4. Configurable — Users can tune thresholds and strategies
  5. Prevents infinite loops — Proactive compression avoids the overflow → interrupt → restart → overflow loop

Related Issues

Alternatives Considered

  1. Agent-level estimation only — Weaker because the agent can't trigger framework-level compaction
  2. Rely on provider usage only — Doesn't work for local/self-hosted providers
  3. Fixed message count compaction — Less precise than token-based

Notes

This is inspired by Hermes Agent's ContextEngine which has been proven in production. The key insight is that token budget awareness should be a framework-level concern, not an agent-level concern. The agent should be able to see the budget status, but the framework should manage it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions