feat(compression): raise compaction trigger to 85% for gpt-5.5 on Codex OAuth#40957
Merged
Conversation
…ex OAuth The ChatGPT Codex OAuth backend hard-caps gpt-5.5 at a 272K context window (verified live: a ~330K-token request to chatgpt.com/backend-api/codex/responses is rejected with context_length_exceeded while ~250K succeeds; the same slug exposes 1.05M on the direct OpenAI API / OpenRouter and 400K on Copilot). At the default 50% trigger, auto-compaction fires at ~136K — half the usable window. Raise the trigger to 85% (~231K) on this exact route only, gated by a new compression.codex_gpt55_autoraise config flag (default true). When it fires, emit a one-time notice (CLI inline print + gateway status_callback replay) with the exact opt-back-out command. gpt-5.5 on any other provider keeps the user's global threshold. - _is_codex_gpt55() matches the 5.5 family only on provider=openai-codex - _compression_threshold_for_model() now provider-aware + opt-out param - config key + _config_version bump (27->28) for backfill - docs + tests (40 cases in test_arcee_trinity_overrides.py)
Contributor
🔎 Lint report:
|
13 tasks
19 tasks
13 tasks
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
…ex OAuth (NousResearch#40957) The ChatGPT Codex OAuth backend hard-caps gpt-5.5 at a 272K context window (verified live: a ~330K-token request to chatgpt.com/backend-api/codex/responses is rejected with context_length_exceeded while ~250K succeeds; the same slug exposes 1.05M on the direct OpenAI API / OpenRouter and 400K on Copilot). At the default 50% trigger, auto-compaction fires at ~136K — half the usable window. Raise the trigger to 85% (~231K) on this exact route only, gated by a new compression.codex_gpt55_autoraise config flag (default true). When it fires, emit a one-time notice (CLI inline print + gateway status_callback replay) with the exact opt-back-out command. gpt-5.5 on any other provider keeps the user's global threshold. - _is_codex_gpt55() matches the 5.5 family only on provider=openai-codex - _compression_threshold_for_model() now provider-aware + opt-out param - config key + _config_version bump (27->28) for backfill - docs + tests (40 cases in test_arcee_trinity_overrides.py)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Infographic
Summary
The ChatGPT Codex OAuth backend hard-caps
gpt-5.5at a 272K context window. At the default 50% compaction trigger, Hermes starts summarizing at ~136K tokens — half the window the model can actually use. This PR raises the trigger to 85% (~231K) for that one route, gated by a new opt-out config flag, and notifies the user once with the exact revert command.The same
gpt-5.5slug exposes a much larger window on other routes (1.05M on the direct OpenAI API and OpenRouter, 400K on GitHub Copilot), so the autoraise is scoped to the Codex OAuth route only — every other provider keeps the user's globalcompression.threshold.Why 272K is real (not a metadata bug)
Verified live against
chatgpt.com/backend-api/codex:/modelsprobecontext_window: 272000,max_context_window: 272000input_tokens=250022)context_length_exceededA request the server itself counted at 250K input tokens went through; bumping to ~330K — which would fit a 400K window — was hard-rejected. The cap is genuine and enforced by the Codex backend.
Changes
agent/auxiliary_client.py_is_codex_gpt55(model, provider)— matches thegpt-5.5family (incl.-pro, dated snapshots, aggregator-prefixed) only whenprovider == "openai-codex"._compression_threshold_for_model()is now provider-aware and takes anallow_codex_gpt55_autoraiseflag; returns0.85for Codexgpt-5.5,None(use global) otherwise. The existing Arcee Trinity0.75override is unchanged and unaffected by the new flag.hermes_cli/config.py— newcompression.codex_gpt55_autoraisekey (defaulttrue);_config_versionbumped 27 → 28 so existing configs get the key backfilled on migration.agent/agent_init.py— reads the flag, passesproviderinto the threshold resolver, and emits a one-time notice (inline print for CLI;_compression_warningreplay throughstatus_callbackfor gateway) with the opt-out command. New_build_codex_gpt55_autoraise_notice()helper builds the shared text.developer-guide/context-compression-and-caching.mddocuments the new key and route-scoped behavior.Opt-out
Tests
tests/agent/test_arcee_trinity_overrides.pyextended with Codexgpt-5.5cases (provider gating, family vs sibling-slug matching, opt-out, and the guarantee that opt-out does not disable the Arcee Trinity override). 40/40 pass.End-to-end verified on a real
AIAgentagainst the live 272K context length:threshold_tokens = 231,200(85%), notice +_compression_warningsetthreshold_tokens = 136,000(50%), no notice