Skip to content

fix(model_metadata): add gpt-5.x context lengths + guard against poisoned cache#5179

Open
Prithvi1994 wants to merge 1 commit into
NousResearch:mainfrom
Prithvi1994:fix/gpt-5-context-length
Open

fix(model_metadata): add gpt-5.x context lengths + guard against poisoned cache#5179
Prithvi1994 wants to merge 1 commit into
NousResearch:mainfrom
Prithvi1994:fix/gpt-5-context-length

Conversation

@Prithvi1994

Copy link
Copy Markdown
Contributor

Fixes #5173gpt-5.4 shows 32k context in Hermes instead of 1,050,000

Root Cause

Two independent bugs conspire to produce the wrong context window:

  1. Missing DEFAULT_CONTEXT_LENGTHS entriesgpt-5.4 (and other gpt-5.x variants) were absent from the fallback dict. Lookups fell through to the generic "gpt-5": 128000 catch-all, returning 128k instead of 1,050,000.

  2. Cache poisoning — When connecting via the Codex endpoint, Hermes probes the API and may receive max_output_tokens (32k) where it expects context_length. That value gets written to context_length_cache.yaml. Since the persistent cache is checked first in the resolution order, the bad 32k value overrides everything permanently.

Fix (two-part)

1. Add specific gpt-5.x entries to DEFAULT_CONTEXT_LENGTHS

New entries added before the generic "gpt-5": 128000 catch-all in agent/model_metadata.py:

Model Context Length
gpt-5.4 1,050,000
gpt-5.4-mini 1,050,000
gpt-5.4-pro 1,050,000
gpt-5.4-nano 1,050,000
gpt-5.3-codex 1,048,576
gpt-5.2-codex 1,048,576
gpt-5.1-codex-max 1,048,576
gpt-5.1-codex-mini 1,048,576

The existing sorted() in get_model_context_length ensures longest-key-first matching, so specific variants correctly shadow the catch-all.

2. Sanity guard in save_context_length()

Added a pre-write check: if the model name contains "gpt-5" and the value being cached is <= 128,000, the write is rejected and a warning is logged. This stops max_output_tokens (32k) from ever being written into context_length_cache.yaml for gpt-5 family models.

The guard does not affect non-gpt-5 models — e.g. llama-3 can still be cached at 32k normally.

Users who need to force a specific value can always set model.context_length in config.yaml, which is checked before the cache.

Tests

Added TestGpt5ContextLengths in tests/agent/test_model_metadata.py:

  • gpt-5.4 -> 1,050,000 via DEFAULT_CONTEXT_LENGTHS
  • gpt-5.4-mini -> 1,050,000 via DEFAULT_CONTEXT_LENGTHS
  • save_context_length("gpt-5.4", ..., 32000) -> silently rejected
  • save_context_length("gpt-5.4", ..., 1_050_000) -> cached successfully
  • Sanity guard does NOT block llama-3 at 32k

All 80 tests pass.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API labels May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/openai OpenAI / Codex Responses API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gpt-5.4 shows 32k context in Hermes instead of 1,050,000

2 participants