Skip to content

Bug: Sudden Context Overflow from Large Tool Results (no preflight guard) #10694

@mkoslacz

Description

@mkoslacz

Bug: Sudden Context Overflow from Large Tool Results (no preflight guard)

Summary

OpenClaw can exceed model context in a single step when a tool returns a very large payload.
After that, requests fail repeatedly with:

  • prompt is too long: ... > 200000
  • input length and max_tokens exceed context limit: ... + 32000 > 200000

In several cases, no automatic compaction happened before the next request, so sessions got stuck in a failure loop.

Environment

  • OpenClaw: 2026.1.30
  • Model: anthropic/claude-sonnet-4-20250514 and anthropic/claude-opus-4-5
  • Context window: 200000

What happened

Large single toolResult messages were appended directly to session transcript and immediately pushed next prompt over limit.

Concrete examples (from local transcripts)

  1. gateway config.schema produced a huge payload:
  • transcript: sessions/d3a0a84a-ab0f-4597-9de4-8353a67acfec.jsonl
  • tool call line: 1402 (gateway {"action":"config.schema"})
  • tool result line: 1403 (~560k chars JSON line)
  • immediate error: line 1404 (prompt is too long: 205607 tokens > 200000)
  1. Grepping session logs with tail ... ~/.openclaw/agents/main/sessions/*.jsonl produced huge outputs:
  • transcript: sessions/d3a0a84a-ab0f-4597-9de4-8353a67acfec.jsonl
  • tool calls: lines 1857, 1869
  • tool results: lines 1858 (~400k chars), 1870 (~409k chars)
  • immediate failures: lines 1859, 1871, 1874, 1877
  1. Same pattern in another session:
  • transcript: sessions/276d664b-06c9-4a1a-80ad-562972d25beb.jsonl
  • huge tool results on lines 32 and 34 (~416k chars each)
  • repeated context-limit errors after that
  1. Repeated failures at high context without successful recovery:
  • transcript: sessions/7bff962a-ac31-44f3-9c17-300507453484.jsonl
  • repeated errors on lines 955, 958, 961, 964, 967, 970
  • message: input length and max_tokens exceed context limit: 173629 + 32000 > 200000

Expected behavior

Before every model request, OpenClaw should guarantee:

  • estimated_input_tokens + requested_max_tokens <= model_context_limit

If not true, OpenClaw should automatically do one or more of:

  1. trigger compaction preflight,
  2. reduce max_tokens dynamically,
  3. truncate/summarize newest oversized tool results,
  4. retry automatically with safe budget.

The session should not enter a repeated hard-failure loop.

Suggested fixes

  1. Preflight context budget check before each LLM call, based on current transcript + planned max_tokens.
  2. Hard output cap per tool result inserted into transcript (character and token budget).
  3. Auto-summarize oversized tool outputs (store full output externally/log file, keep summary in context).
  4. Guardrails for high-risk commands that can return giant transcript chunks (e.g. tail ~/.openclaw/agents/main/sessions/*.jsonl).
  5. Fallback retry policy:
  • on input + max_tokens error: reduce max_tokens and retry,
  • if still too large: force compaction and retry once.
  1. Optional warning mode when any single toolResult exceeds configured threshold (e.g. 10k/20k tokens).

Why this matters

This causes:

  • session stalls,
  • 5x+ token usage spikes (many failed retries),
  • broken automation loops until manual intervention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions