fix(model_metadata): add gpt-5.x context lengths + guard against poisoned cache#5179
Open
Prithvi1994 wants to merge 1 commit into
Open
fix(model_metadata): add gpt-5.x context lengths + guard against poisoned cache#5179Prithvi1994 wants to merge 1 commit into
Prithvi1994 wants to merge 1 commit into
Conversation
This was referenced Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #5173 —
gpt-5.4shows 32k context in Hermes instead of 1,050,000Root Cause
Two independent bugs conspire to produce the wrong context window:
Missing DEFAULT_CONTEXT_LENGTHS entries —
gpt-5.4(and othergpt-5.xvariants) were absent from the fallback dict. Lookups fell through to the generic"gpt-5": 128000catch-all, returning 128k instead of 1,050,000.Cache poisoning — When connecting via the Codex endpoint, Hermes probes the API and may receive
max_output_tokens(32k) where it expectscontext_length. That value gets written tocontext_length_cache.yaml. Since the persistent cache is checked first in the resolution order, the bad 32k value overrides everything permanently.Fix (two-part)
1. Add specific gpt-5.x entries to
DEFAULT_CONTEXT_LENGTHSNew entries added before the generic
"gpt-5": 128000catch-all inagent/model_metadata.py:gpt-5.4gpt-5.4-minigpt-5.4-progpt-5.4-nanogpt-5.3-codexgpt-5.2-codexgpt-5.1-codex-maxgpt-5.1-codex-miniThe existing
sorted()inget_model_context_lengthensures longest-key-first matching, so specific variants correctly shadow the catch-all.2. Sanity guard in
save_context_length()Added a pre-write check: if the model name contains
"gpt-5"and the value being cached is <= 128,000, the write is rejected and a warning is logged. This stopsmax_output_tokens(32k) from ever being written intocontext_length_cache.yamlfor gpt-5 family models.The guard does not affect non-gpt-5 models — e.g.
llama-3can still be cached at 32k normally.Users who need to force a specific value can always set
model.context_lengthinconfig.yaml, which is checked before the cache.Tests
Added
TestGpt5ContextLengthsintests/agent/test_model_metadata.py:gpt-5.4-> 1,050,000 via DEFAULT_CONTEXT_LENGTHSgpt-5.4-mini-> 1,050,000 via DEFAULT_CONTEXT_LENGTHSsave_context_length("gpt-5.4", ..., 32000)-> silently rejectedsave_context_length("gpt-5.4", ..., 1_050_000)-> cached successfullyllama-3at 32kAll 80 tests pass.