[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction

# [Feature]: Runtime Token Budget Awareness and Proactive Context Compaction

## Summary

OpenClaw's current context management relies on **passive overflow-based compaction** — it only triggers when the context window is exceeded, which interrupts the running task and restarts it from scratch. This creates a poor user experience and can cause infinite loops when compaction repeatedly interrupts the same task.

This feature request proposes adding **runtime token budget awareness** inspired by [Hermes Agent](https://github.com/hermes-agent)'s `ContextEngine`, enabling proactive, threshold-based context compaction before overflow occurs.

## Current Behavior

1. **Token tracking depends on provider response** — If the provider (e.g., omlx local MLX server) doesn't return `usage` fields, OpenClaw shows `?/131k (?%)` with zero compaction tracking.
2. **Compaction triggers on overflow** — Only fires when the context window is already full, interrupting the running agent task.
3. **No budget awareness** — The agent has no visibility into how much context budget remains.
4. **No proactive compression** — Cannot compress before hitting the limit.

## Proposed Behavior

### 1. Runtime Token Budget Tracking

A lightweight context budget tracker that runs at the framework level:

```typescript
class ContextBudget {
  // Estimated tokens used in current session
  estimatedTokens: number = 0;
  
  // Configurable budget threshold (default: 80% of contextWindow)
  thresholdTokens: number = 0;
  
  // Context window size from provider config
  contextWindow: number = 0;
  
  // Update from API response usage (when available)
  updateFromResponse(promptTokens: number, completionTokens: number): void;
  
  // Estimate tokens when provider doesn't return usage
  estimateFromMessages(messages: Message[]): number;
  
  // Check if proactive compression should trigger
  shouldCompress(): boolean;
  
  // Get budget status for diagnostics
  getStatus(): { used: number; budget: number; remaining: number; pct: number };
}
```

**Key design:**
- Uses provider `usage` when available (same as current)
- Falls back to **character-to-token estimation** (roughly `chars / 4`) when provider doesn't return usage
- Tracks both prompt tokens (system + history) and completion tokens
- Configurable threshold (default 80% of `contextWindow`)

### 2. Proactive Compaction

Trigger compaction **before** overflow, not after:

```
Context usage:  0% ─────── 80% ─────── 100%
                 ↑              ↑           ↑
              shouldCompress   threshold   overflow
              (proactive)      (warn)      (current behavior)
```

- **At 80%**: Log warning, optionally notify agent
- **At 85-95%**: Trigger proactive compaction (LLM summarization)
- **At 100%**: Force compaction (current fallback, but now as last resort)

### 3. Configurable Compaction Strategy

```jsonc
{
  "agents": {
    "defaults": {
      "contextTokens": 131072,
      "compaction": {
        "enabled": true,
        "strategy": "summarize", // "summarize" | "truncate" | "discard"
        "thresholdPct": 80,      // trigger at 80% of contextWindow
        "headProtection": 15,    // always keep last N messages
        "tailProtection": 5,     // always keep first N messages
        "summaryPrefix": "[CONTEXT SUMMARY — reference only]"
      }
    }
  }
}
```

### 4. Agent Visibility

The agent should have access to budget status:

- **`/status` output**: `📚 Context: 45,231/131,072 (34%)` — even when provider doesn't return usage, use estimation
- **System hint**: When approaching threshold, optionally inject a subtle hint into the system prompt
- **Agent API**: Expose `contextBudget` status through session status

### 5. Token Estimation Fallback

For providers that don't return `usage` (like omlx local MLX):

```typescript
// Rough estimation: ~4 chars per token for mixed English/Chinese
const CHARS_PER_TOKEN = 4;

function estimateTokens(content: string): number {
  return Math.ceil(content.length / CHARS_PER_TOKEN);
}

// More accurate: count actual tokens via a lightweight tokenizer
// (optional, for providers that support it)
```

## Implementation Approach

### Phase 1: Budget Tracker (Core)
- Add `ContextBudget` class to the gateway
- Integrate with existing `contextTokens` config
- Use provider `usage` when available, fall back to estimation
- Update `/status` display to show actual/estimated tokens

### Phase 2: Proactive Compaction
- Add compaction trigger in the message processing pipeline
- Implement LLM-based summarization compaction (similar to Hermes)
- Head/tail protection to preserve important context
- Configurable thresholds

### Phase 3: Agent Integration
- Expose budget status to the agent
- Optional system prompt hints when approaching threshold
- Agent can trigger manual compaction via tool call

## Benefits

1. **No more interrupting running tasks** — Compaction happens before overflow
2. **Better UX** — Dashboard shows real token usage, not `?/131k`
3. **Works with any provider** — Estimation fallback for providers without usage reporting
4. **Configurable** — Users can tune thresholds and strategies
5. **Prevents infinite loops** — Proactive compression avoids the overflow → interrupt → restart → overflow loop

## Related Issues

- #79359 — omlx Provider Fails to Report Token Usage (dashboard shows ?/131k)
- #80548 — contextInjection "always" mode bootstrapFiles accumulation
- #79703 — (context compression related)

## Alternatives Considered

1. **Agent-level estimation only** — Weaker because the agent can't trigger framework-level compaction
2. **Rely on provider usage only** — Doesn't work for local/self-hosted providers
3. **Fixed message count compaction** — Less precise than token-based

## Notes

This is inspired by Hermes Agent's `ContextEngine` which has been proven in production. The key insight is that token budget awareness should be a **framework-level concern**, not an agent-level concern. The agent should be able to *see* the budget status, but the framework should *manage* it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction #80594

[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction

Summary

Current Behavior

Proposed Behavior

1. Runtime Token Budget Tracking

2. Proactive Compaction

3. Configurable Compaction Strategy

4. Agent Visibility

5. Token Estimation Fallback

Implementation Approach

Phase 1: Budget Tracker (Core)

Phase 2: Proactive Compaction

Phase 3: Agent Integration

Benefits

Related Issues

Alternatives Considered

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction #80594

Description

[Feature]: Runtime Token Budget Awareness and Proactive Context Compaction

Summary

Current Behavior

Proposed Behavior

1. Runtime Token Budget Tracking

2. Proactive Compaction

3. Configurable Compaction Strategy

4. Agent Visibility

5. Token Estimation Fallback

Implementation Approach

Phase 1: Budget Tracker (Core)

Phase 2: Proactive Compaction

Phase 3: Agent Integration

Benefits

Related Issues

Alternatives Considered

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions