Skip to content

Switching to a smaller-context model mid-session causes hard crash instead of graceful truncation #44303

@Adam-Researchh

Description

@Adam-Researchh

Bug Description

When a user switches from a large-context model (e.g., GPT 5.4 at 1.05M tokens or Claude Opus at 200K) to a smaller-context model (e.g., Ollama qwen3.5:122b-a10b at 128K) mid-session, the gateway crashes/locks up if the accumulated session context exceeds the new model's context window.

The user is forced to manually restart the gateway and reset the session, losing all in-flight context.

Steps to Reproduce

  1. Start a session with a large-context model (e.g., openai-codex/gpt-5.4 with 1.05M context)
  2. Accumulate significant context (tool calls, file reads, conversation history — typical working session of 30+ minutes)
  3. Switch to a smaller-context model via /model qwen (128K context) or similar
  4. Gateway becomes unresponsive — no error message returned to user, no graceful degradation

Expected Behavior

When switching to a model with a smaller context window, OpenClaw should:

  1. Detect that current session context exceeds the target model's context window before sending the request
  2. Automatically truncate/compact the context to fit the new model's window (emergency compaction)
  3. Warn the user that context was truncated to accommodate the smaller model (e.g., "⚠️ Switching to qwen (128K context). Session context was 180K tokens — older messages have been compacted to fit.")
  4. Continue the session without requiring a manual restart or reset

If truncation is not possible or would lose too much context, the switch should fail gracefully with a clear error message and keep the current model active.

Actual Behavior

  • Gateway locks up entirely (no response to user, Telegram polling stalls)
  • Gateway log shows: Unsubscribed during compaction and Gateway is draining for restart; new tasks are not accepted
  • User must manually restart the gateway and reset the session
  • All in-flight context is lost
  • Any background processes (bots, cron jobs) may be disrupted by the forced restart

Evidence from Logs

2026-03-12T12:28:41.474-04:00 [telegram] Restarting polling after unhandled network error: Unsubscribed during compaction
2026-03-12T12:28:41.539-04:00 Followup agent failed before reply: Gateway is draining for restart; new tasks are not accepted

A prior occurrence (Mar 10) showed a cleaner error but still required manual intervention:

2026-03-10T14:19:50.377-04:00 [agent/embedded] embedded run agent end: isError=true error=Context overflow: prompt too large for the model.

Related Issues

Suggested Implementation

  1. Pre-flight check on /model switch: Before applying a model change, compare currentContextTokens against newModel.contextWindow. If it exceeds the new window, trigger an emergency compaction pass first.
  2. Emergency compaction for model switch: A targeted compaction that aggressively summarizes older turns to fit within the new model's budget, preserving the most recent context and system prompt.
  3. Graceful fallback: If emergency compaction still can't fit (e.g., system prompt + workspace files alone exceed the new model's window), reject the switch with a clear error: "Cannot switch to [model] — system prompt and workspace context alone require [X]K tokens, which exceeds [model]'s [Y]K context window."
  4. User warning on success: When context is successfully truncated, inform the user what was lost so they can re-establish context if needed.

Environment

  • OpenClaw v2026.3.11 (29dc654)
  • Source model: openai-codex/gpt-5.4 (1.05M context)
  • Target model: ollama/qwen3.5:122b-a10b (128K context)
  • Channel: Telegram
  • Host: macOS (Darwin arm64)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions