Skip to content

feat: compaction model fallbacks when primary model is unreachable #52011

@scottgl9

Description

@scottgl9

Problem

When compaction.model is set to a local/self-hosted model (e.g. vllm/qwen3-coder-next), compaction fails immediately if that model server is unreachable. There is no fallback mechanism — compaction returns ok: false and the session continues to grow until it hits overflow recovery.

This is especially common when:

  • A local model (vLLM, llamacpp, Ollama) is the primary compaction model for cost savings
  • The GPU server is on a separate machine that may go down
  • Hosted models are configured as fallbacks for regular agent turns but compaction has no equivalent

Proposed Solution

Add a modelFallbacks array to AgentCompactionConfig. When resolveModelAsync fails for the primary compaction model, iterate through modelFallbacks in order until one resolves successfully.

Config example:
compaction.model: vllm/qwen3-coder-next
compaction.modelFallbacks: [anthropic/claude-haiku-4-5]

Impact

  • Prevents compaction failures when local model servers are temporarily unavailable
  • Maintains cost optimization (try free local model first, fall back to cheap hosted)
  • No behavior change when modelFallbacks is unset (backward compatible)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions