Problem
When compaction.model is set to a local/self-hosted model (e.g. vllm/qwen3-coder-next), compaction fails immediately if that model server is unreachable. There is no fallback mechanism — compaction returns ok: false and the session continues to grow until it hits overflow recovery.
This is especially common when:
- A local model (vLLM, llamacpp, Ollama) is the primary compaction model for cost savings
- The GPU server is on a separate machine that may go down
- Hosted models are configured as fallbacks for regular agent turns but compaction has no equivalent
Proposed Solution
Add a modelFallbacks array to AgentCompactionConfig. When resolveModelAsync fails for the primary compaction model, iterate through modelFallbacks in order until one resolves successfully.
Config example:
compaction.model: vllm/qwen3-coder-next
compaction.modelFallbacks: [anthropic/claude-haiku-4-5]
Impact
- Prevents compaction failures when local model servers are temporarily unavailable
- Maintains cost optimization (try free local model first, fall back to cheap hosted)
- No behavior change when modelFallbacks is unset (backward compatible)
Problem
When compaction.model is set to a local/self-hosted model (e.g. vllm/qwen3-coder-next), compaction fails immediately if that model server is unreachable. There is no fallback mechanism — compaction returns ok: false and the session continues to grow until it hits overflow recovery.
This is especially common when:
Proposed Solution
Add a modelFallbacks array to AgentCompactionConfig. When resolveModelAsync fails for the primary compaction model, iterate through modelFallbacks in order until one resolves successfully.
Config example:
compaction.model: vllm/qwen3-coder-next
compaction.modelFallbacks: [anthropic/claude-haiku-4-5]
Impact