Skip to content

fix(agent): tolerate large codex stream prefill#33390

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-5bf34d29
May 27, 2026
Merged

fix(agent): tolerate large codex stream prefill#33390
teknium1 merged 2 commits into
mainfrom
hermes/hermes-5bf34d29

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvage of #33383 (@sanghyuk-seo-nexcube) onto current main.

Summary

Large openai-codex subscription requests can spend longer than the fixed TTFB cutoff in backend admission or prompt prefill before the first SSE event is emitted. Hermes treated that as a no-first-byte stream failure and reconnected, which caused long-context Codex turns to retry even though the backend could still complete successfully (the exact pattern in CRUSADER's support-thread report).

Updates the Codex Responses watchdog policy to distinguish three cases:

  • small requests with no first SSE event still fail fast and reconnect
  • large openai-codex requests wait for backend prefill instead of being killed by the first-byte watchdog
  • streams that emit at least one SSE event then go idle are handled by a separate event-idle watchdog

How this fits the cluster

Complements (does not replace):

This PR is the next layer: the watchdog now distinguishes 'backend hung' (small request, no first event) from 'backend slow on prefill' (large request, no first event yet but expected to come).

Changes

  • agent/chat_completion_helpers.py: three-case watchdog policy
  • tests/agent/test_codex_ttfb_watchdog.py: 8 tests covering all three policy buckets

Validation

  • 8/8 passing in tests/agent/test_codex_ttfb_watchdog.py

Attribution

Clean cherry-pick from @sanghyuk-seo-nexcube's #33383. AUTHOR_MAP updated in follow-up commit.

@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-5bf34d29 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9507 on HEAD, 9507 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5006 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit 3476509 into main May 27, 2026
32 of 33 checks passed
@teknium1 teknium1 deleted the hermes/hermes-5bf34d29 branch May 27, 2026 18:19
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API labels May 27, 2026
@yangguangjin

Copy link
Copy Markdown

[subagent-4] ⚠️ API call failed (attempt 1/10): APIConnectionError
[subagent-4] 🔌 Provider: openai-codex Model: gpt-5.5
[subagent-4] 🌐 Endpoint: https://chatgpt.com/backend-api/codex
[subagent-4] 📝 Error: Connection error.
[subagent-4] ⏱️ Elapsed: 106.09s Context: 31 msgs, ~82,961 tokens
[subagent-4] ⏳ Retrying in 2.5s (attempt 1/10)...
✓ [4/5] excute Agent-D:code-review。 (537.61s)
[subagent-2] ⚠️ No response from provider for 600s (non-streaming, model: gpt-5.5). Codex backend appears to be silently rejecting 'gpt-5.5' on chatgpt.com/backend-api/codex (no stream events, no error). This is a known backend-side pattern that has affected ChatGPT Plus accounts intermittently. Workaround: try gpt-5.4 on the same OAuth profile, or gpt-5.3-codex, or switch to a different model/provider in your fallback chain. Some ChatGPT Codex accounts do not support gpt-5.4-codex. See hermes-agent#21444 for symptom history.
[subagent-2] ⚠️ API call failed (attempt 1/10): APIConnectionError
[subagent-2] 🔌 Provider: openai-codex Model: gpt-5.5
[subagent-2] 🌐 Endpoint: https://chatgpt.com/backend-api/codex
[subagent-2] 📝 Error: Connection error.
[subagent-2] ⏱️ Elapsed: 600.96s Context: 16 msgs, ~14,152 tokens
[subagent-2] ⏳ Retrying in 2.8s (attempt 1/10)...
[subagent-0] ⚠️ API call failed (attempt 1/10): APIConnectionError
[subagent-0] 🔌 Provider: openai-codex Model: gpt-5.5
[subagent-0] 🌐 Endpoint: https://chatgpt.com/backend-api/codex
[subagent-0] 📝 Error: Connection error.
[subagent-0] ⏱️ Elapsed: 601.48s Context: 21 msgs, ~27,239 tokens
[subagent-0] ⏳ Retrying in 2.2s (attempt 1/10)...

I update last git version ,but the situation has not improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have provider/openai OpenAI / Codex Responses API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants