Skip to content

feat(compaction): add fallbackModel to retry compaction on quota/rate-limit errors#34569

Closed
iamcobolt wants to merge 1 commit intoopenclaw:mainfrom
iamcobolt:feat/compaction-fallback
Closed

feat(compaction): add fallbackModel to retry compaction on quota/rate-limit errors#34569
iamcobolt wants to merge 1 commit intoopenclaw:mainfrom
iamcobolt:feat/compaction-fallback

Conversation

@iamcobolt
Copy link
Copy Markdown

AI-assisted — drafted and iterated with Claude Code (claude-sonnet-4-6). Lightly tested: unit tests pass, pnpm tsgo clean, pnpm check passes; fallback retry path validated by code review; not exercised against live quota exhaustion. I have personally reviewed all code changes prior to this commit and take responsibility for their correctness.

Summary

  • Problem: when compaction hits a quota or rate-limit error on the primary model, the error surfaces directly to the session with no recovery path.
  • Why it matters: users with aggressive compaction thresholds on paid tiers hit quota limits mid-session; without a fallback, the session stalls until quota resets.
  • What changed: new agents.defaults.compaction.fallbackModel key. Set to "fallback" to retry compaction using the agents.defaults.model.fallbacks chain in order, skipping the current model. Defaults to "off".
  • What did NOT change: auth errors and timeouts use their own retry mechanisms; fallback only activates on HTTP 429 / 402 billing errors.

Schema

agents.defaults.compaction.fallbackModel: "off" (default) | "fallback"

  • "off": quota/rate-limit errors surface as-is.
  • "fallback": retries each model in agents.defaults.model.fallbacks in order, skipping the current model. Re-throws after exhausting all candidates.

Renders as a two-button segmented control in the config UI, with a help tooltip.

UI — Before / After

Before (upstream main — no Compaction Fallback Model field):

before

After — new Compaction Fallback Model field with off / fallback segmented control:

after

Change Type

  • Feature
  • Docs

Scope

  • Gateway / orchestration
  • API / contracts
  • UI / DX

Linked Issue/PR

Key Changes

  • compaction-fallback.ts (new) — resolveCompactionFallbackCandidates() resolves the ordered candidate list from config; filters out the current model
  • compaction-fallback.test.ts (new) — 6 unit tests covering all states
  • compact.ts — outer model-fallback loop wrapping the thinking-retry inner loop; transcriptPolicy and sanitizeSessionHistory re-resolved per candidate; before_compaction hook guarded by hookFired (fires exactly once per compaction event)
  • compaction-overrides.ts — also ships resolveCompactionThinkLevel (companion to feat(compaction): disable thinking by default; add compaction.thinking override #34554); both features share this helper
  • types.agent-defaults.ts / zod-schema.agent-defaults.ts — both fallbackModel and thinking fields narrowed to their 2-option unions

Notes for Reviewers

User-visible / Behavior Changes

  • New config key agents.defaults.compaction.fallbackModel (opt-in; defaults to "off", no change in behaviour).
  • When set to "fallback" and compaction hits a quota error, it retries silently with the next model in the fallbacks chain.

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No — uses existing auth resolution per candidate
  • New/changed network calls? Only when a fallback is triggered (same API surface as primary)
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

  1. Set agents.defaults.compaction.fallbackModel: "fallback" with a model.fallbacks chain
  2. Exhaust primary model quota or simulate a 429 response
  3. Trigger compaction → expect silent retry on next fallback model

Evidence

  • 6 unit tests covering all resolveCompactionFallbackCandidates cases
  • Live quota-exhaustion trace (not yet available)

Human Verification

  • Verified: unit tests pass, pnpm tsgo clean, pnpm check passes
  • Edge cases: absent config, "off", "fallback" with empty fallbacks, "fallback" filtering current model, hookFired guard across model and thinking retry loops
  • Not verified: live quota exhaustion

Compatibility / Migration

  • Backward compatible: fallbackModel is optional; default is "off" (no behaviour change)
  • To enable: agents.defaults.compaction.fallbackModel: "fallback" with agents.defaults.model.fallbacks populated

Failure Recovery

  • To disable: remove or set agents.defaults.compaction.fallbackModel: "off"
  • Bad symptom: compaction retries when it shouldn't — check fallbackModel config value

Risks and Mitigations

  • Risk: fallback model may produce a lower-quality compaction summary.
    • Mitigation: "fallback" is opt-in; users configure their own fallbacks chain.
  • Risk: cross-provider fallback (e.g. anthropic → openai) could fail on provider-specific message formats.
    • Mitigation: transcriptPolicy and sanitizeSessionHistory are re-resolved per candidate.

…-limit errors

Adds agents.defaults.compaction.fallbackModel — an opt-in key that retries
compaction with a different model when the primary fails due to quota or
rate-limit errors (HTTP 429 / 402 billing). Auth errors and timeouts use
their own retry mechanisms and are unaffected.

"off" (default): no change in behaviour
"fallback": tries each model in agents.defaults.model.fallbacks in order,
            skipping the current model; re-throws after exhausting candidates

Also ships the companion thinking-override change (compaction-overrides.ts)
needed for compact.ts's inner thinking-retry loop. See companion PR openclaw#33296.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 14, 2026
@openclaw-barnacle
Copy link
Copy Markdown

Closing due to inactivity.
If you believe this PR should be revived, post in #pr-thunderdome-dangerzone on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation gateway Gateway runtime size: L stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant