Skip to content

fix(context-compression): fallback to main model when summary_model_override is None and provider returns 413#18603

Open
vominh1919 wants to merge 1 commit into
NousResearch:mainfrom
vominh1919:fix/context-compression-fallback-empty-model
Open

fix(context-compression): fallback to main model when summary_model_override is None and provider returns 413#18603
vominh1919 wants to merge 1 commit into
NousResearch:mainfrom
vominh1919:fix/context-compression-fallback-empty-model

Conversation

@vominh1919

Copy link
Copy Markdown
Contributor

Problem

When summary_model_override is not configured (the default, None), self.summary_model is set to an empty string ("") which is falsy. The fallback conditions in _generate_summary() both check self.summary_model:

# Line 906 (model-not-found fast path)
if (
    _is_model_not_found
    and self.summary_model          # ← falsy when ""
    and self.summary_model != self.model
    ...
):

# Line 939 (unknown-error best-effort retry)
if (
    self.summary_model              # ← falsy when ""
    and self.summary_model != self.model
    ...
):

When self.summary_model is empty, both conditions short-circuit — no fallback to the main model ever happens. This means users with no explicit summary_model_override get zero compression once the default provider hits rate limits (413 TPM exhausted), because there is no retry on the main model.

Fix

  1. Add _is_rate_limited check — detects 413 status, "rate limit", "TPM", "tokens per minute" error strings
  2. Broaden fallback conditions — allow fallback when summary_model is empty AND the error is rate-limit or model-not-found (not generic errors, to avoid pointless retries when the default provider IS the main model)
  3. Explicit model on retry — when falling back from empty summary_model, set it to self.model so the retry explicitly uses the main model instead of the default provider that may differ
  4. Add regression testtest_empty_summary_model_413_falls_back_to_main verifies the 413 → fallback → success flow

Before vs After

Scenario Before After
summary_model_override=None, provider returns 413 ❌ No retry, returns None ✅ Retries on main model
summary_model_override=None, provider returns 404 ❌ No retry, returns None ✅ Retries on main model
summary_model_override=None, generic error ❌ No retry, returns None ❌ No retry (correct — same provider)
summary_model_override="other", any error ✅ Retries on main ✅ Retries on main (unchanged)

Tests

All 67 existing tests pass + 1 new regression test added.

Fixes #18588

…rovider returns 413

When summary_model_override is not configured (None), self.summary_model
is set to empty string which is falsy. The fallback conditions at lines
906-911 and 939-943 both check self.summary_model, so when it is empty,
no fallback happens — even for rate-limit (413) or model-not-found errors.

This means users with no explicit summary_model_override configured get
zero compression once the default provider hits rate limits, because
there is no retry on the main model.

Fix:
- Add _is_rate_limited check (413, rate limit, TPM, tokens per minute)
- Allow fallback when summary_model is empty AND error is rate-limit or
  model-not-found (not generic errors, to avoid pointless retries)
- When falling back from empty summary_model, explicitly set it to
  self.model so the retry uses the main model instead of the default
  provider that may differ
- Add test_empty_summary_model_413_falls_back_to_main regression test

Fixes NousResearch#18588
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 2, 2026
Cyrene963 pushed a commit to Cyrene963/hermes-agent that referenced this pull request May 3, 2026
Community PRs applied:
- NousResearch#18596: Enable secret redaction by default (SECURITY)
- NousResearch#18650: Sanitize malformed tool messages + auto-recover on API 400
- NousResearch#18607: Emergency compression before max_iterations exhaustion
- NousResearch#18603: Compression fallback to main model on 413 rate limit
- NousResearch#18638: Pass threshold_percent on model switch
- NousResearch#18663: Strip extra_content from tool_calls for strict APIs
- NousResearch#18618: Forward explicit_api_key to OpenRouter
- NousResearch#18632: Show cache tokens in /insights breakdown
- NousResearch#18614: Add idempotency guard for patch duplicate loops
- NousResearch#18600: Raise ValueError when HERMES_HOME unset in profile mode
- NousResearch#18616: Allow ZWJ emoji in context files
- NousResearch#18582: Reload .env on /restart
- NousResearch#18547: Stabilize system prompt prefix for KV cache reuse
- NousResearch#18692: Strip FTS5 operators from session search truncation terms

Fix: Add order_by_last_active=True to list_sessions_rich call
(pre-existing commit 142b4bf code sync)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(context-compression): no fallback to main model when summary_model_override is None and Groq returns 413

2 participants