Skip to content

fix(agent): tolerate large Codex stream prefill#33383

Closed
sanghyuk-seo-nexcube wants to merge 1 commit into
NousResearch:mainfrom
sanghyuk-seo-nexcube:fix/codex-large-context-ttfb
Closed

fix(agent): tolerate large Codex stream prefill#33383
sanghyuk-seo-nexcube wants to merge 1 commit into
NousResearch:mainfrom
sanghyuk-seo-nexcube:fix/codex-large-context-ttfb

Conversation

@sanghyuk-seo-nexcube

@sanghyuk-seo-nexcube sanghyuk-seo-nexcube commented May 27, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Large openai-codex subscription requests can spend longer than the fixed TTFB cutoff in backend admission or prompt prefill before the first SSE event is emitted. Hermes treated that as a no-first-byte stream failure and reconnected, which caused long-context Codex turns to retry even though the backend could still complete successfully.

This updates the Codex Responses watchdog policy to distinguish three cases:

  • small requests with no first SSE event still fail fast and reconnect
  • large openai-codex requests wait for backend prefill instead of being killed by the first-byte watchdog
  • streams that emit at least one SSE event and then go idle are handled by a separate event-idle watchdog

How this differs from recent Codex stream fixes

This complements the existing Codex stream fixes rather than replacing them:

Those fixes address stream consumption once events arrive, or improve the message shown when the TTFB watchdog kills a request. This PR addresses the separate pre-first-event case where hosted Codex accepts a large request but spends a long time in admission/prompt prefill before emitting the first SSE event.

Related Issue

Addresses the no-first-byte / large-context TTFB portion of #33075.
Related: #22986, #7069, #32963, #33042, #33133

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • Updated agent/chat_completion_helpers.py to detect the hosted openai-codex Responses backend in the streaming watchdog path.
  • Scaled stale and idle stream watchdog thresholds by estimated request context size.
  • Disabled strict no-first-byte TTFB reconnects for large hosted Codex requests by default, while preserving HERMES_CODEX_TTFB_STRICT=1 for strict behavior.
  • Kept fast reconnect behavior for small no-byte stalls and capped stale TTFB env values for small hosted Codex requests.
  • Added a post-first-event idle watchdog so real stream stalls are still retried.
  • Expanded tests/agent/test_codex_ttfb_watchdog.py with regression coverage for small no-byte stalls, first-event-then-idle stalls, large-context prefill delay, and strict-mode behavior.

How to Test

  1. scripts/run_tests.sh tests/agent/test_codex_ttfb_watchdog.py
  2. scripts/run_tests.sh tests/agent/test_codex_ttfb_watchdog.py tests/agent/test_auxiliary_client.py tests/run_agent/test_run_agent_codex_responses.py
  3. Manual verification on macOS: long openai-codex turns with large estimated input context no longer reconnect solely because the hosted backend takes longer than the small-request TTFB cutoff before the first SSE event.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run relevant tests and all tests pass
  • I've added tests for my changes
  • I've tested on my platform: macOS

Documentation & Housekeeping

  • Documentation update N/A
  • cli-config.yaml.example update N/A
  • CONTRIBUTING.md or AGENTS.md update N/A
  • Cross-platform impact considered: no file I/O, terminal, process management, or platform-specific behavior changed
  • Tool descriptions/schemas update N/A

Screenshots / Logs

Before this fix, large hosted Codex requests could show repeated reconnects like:

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.

After this fix, large hosted Codex requests wait for backend prefill; small no-byte stalls and post-first-event idle stalls are still retried.

@sanghyuk-seo-nexcube sanghyuk-seo-nexcube force-pushed the fix/codex-large-context-ttfb branch from a2d2ceb to 5c58fb1 Compare May 27, 2026 18:07
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder codex P3 Low — cosmetic, nice to have labels May 27, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Salvaged onto current main via #33390 (merged as 3476509). Your authorship is preserved in git log. Thanks @sanghyuk-seo-nexcube — the three-case watchdog policy (small-no-event / large-no-event / event-then-idle) is exactly the right shape, and it directly addresses user reports like CRUSADER's gpt-5.5 long-context TTFB false-positives.

@teknium1 teknium1 closed this May 27, 2026
mathias3 pushed a commit to mathias3/hermes-agent that referenced this pull request May 28, 2026
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants