fix(context-compression): fallback to main model when summary_model_override is None and provider returns 413#18603
Open
vominh1919 wants to merge 1 commit into
Conversation
…rovider returns 413 When summary_model_override is not configured (None), self.summary_model is set to empty string which is falsy. The fallback conditions at lines 906-911 and 939-943 both check self.summary_model, so when it is empty, no fallback happens — even for rate-limit (413) or model-not-found errors. This means users with no explicit summary_model_override configured get zero compression once the default provider hits rate limits, because there is no retry on the main model. Fix: - Add _is_rate_limited check (413, rate limit, TPM, tokens per minute) - Allow fallback when summary_model is empty AND error is rate-limit or model-not-found (not generic errors, to avoid pointless retries) - When falling back from empty summary_model, explicitly set it to self.model so the retry uses the main model instead of the default provider that may differ - Add test_empty_summary_model_413_falls_back_to_main regression test Fixes NousResearch#18588
Cyrene963
pushed a commit
to Cyrene963/hermes-agent
that referenced
this pull request
May 3, 2026
Community PRs applied: - NousResearch#18596: Enable secret redaction by default (SECURITY) - NousResearch#18650: Sanitize malformed tool messages + auto-recover on API 400 - NousResearch#18607: Emergency compression before max_iterations exhaustion - NousResearch#18603: Compression fallback to main model on 413 rate limit - NousResearch#18638: Pass threshold_percent on model switch - NousResearch#18663: Strip extra_content from tool_calls for strict APIs - NousResearch#18618: Forward explicit_api_key to OpenRouter - NousResearch#18632: Show cache tokens in /insights breakdown - NousResearch#18614: Add idempotency guard for patch duplicate loops - NousResearch#18600: Raise ValueError when HERMES_HOME unset in profile mode - NousResearch#18616: Allow ZWJ emoji in context files - NousResearch#18582: Reload .env on /restart - NousResearch#18547: Stabilize system prompt prefix for KV cache reuse - NousResearch#18692: Strip FTS5 operators from session search truncation terms Fix: Add order_by_last_active=True to list_sessions_rich call (pre-existing commit 142b4bf code sync)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
summary_model_overrideis not configured (the default,None),self.summary_modelis set to an empty string ("") which is falsy. The fallback conditions in_generate_summary()both checkself.summary_model:When
self.summary_modelis empty, both conditions short-circuit — no fallback to the main model ever happens. This means users with no explicitsummary_model_overrideget zero compression once the default provider hits rate limits (413 TPM exhausted), because there is no retry on the main model.Fix
_is_rate_limitedcheck — detects 413 status, "rate limit", "TPM", "tokens per minute" error stringssummary_modelis empty AND the error is rate-limit or model-not-found (not generic errors, to avoid pointless retries when the default provider IS the main model)summary_model, set it toself.modelso the retry explicitly uses the main model instead of the default provider that may differtest_empty_summary_model_413_falls_back_to_mainverifies the 413 → fallback → success flowBefore vs After
summary_model_override=None, provider returns 413summary_model_override=None, provider returns 404summary_model_override=None, generic errorsummary_model_override="other", any errorTests
All 67 existing tests pass + 1 new regression test added.
Fixes #18588