fix(agent): tolerate large codex stream prefill by teknium1 · Pull Request #33390 · NousResearch/hermes-agent

teknium1 · 2026-05-27T18:09:42Z

Salvage of #33383 (@sanghyuk-seo-nexcube) onto current main.

Summary

Large openai-codex subscription requests can spend longer than the fixed TTFB cutoff in backend admission or prompt prefill before the first SSE event is emitted. Hermes treated that as a no-first-byte stream failure and reconnected, which caused long-context Codex turns to retry even though the backend could still complete successfully (the exact pattern in CRUSADER's support-thread report).

Updates the Codex Responses watchdog policy to distinguish three cases:

small requests with no first SSE event still fail fast and reconnect
large openai-codex requests wait for backend prefill instead of being killed by the first-byte watchdog
streams that emit at least one SSE event then go idle are handled by a separate event-idle watchdog

How this fits the cluster

Complements (does not replace):

fix(agent): recover Codex Responses streams with null output #32963 — recovered from response.completed.output = null crashes
refactor(codex): drop SDK responses.stream() helper; consume events directly #33042 — removed SDK responses.stream() helper entirely
fix(codex): update silent-hang workaround hint + wire into TTFB watchdog #33133 — corrected the silent-hang hint text + wired into TTFB watchdog

This PR is the next layer: the watchdog now distinguishes 'backend hung' (small request, no first event) from 'backend slow on prefill' (large request, no first event yet but expected to come).

Changes

agent/chat_completion_helpers.py: three-case watchdog policy
tests/agent/test_codex_ttfb_watchdog.py: 8 tests covering all three policy buckets

Validation

8/8 passing in tests/agent/test_codex_ttfb_watchdog.py

Attribution

Clean cherry-pick from @sanghyuk-seo-nexcube's #33383. AUTHOR_MAP updated in follow-up commit.

github-actions · 2026-05-27T18:10:34Z

🔎 Lint report: `hermes/hermes-5bf34d29` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9507 on HEAD, 9507 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5006 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

yangguangjin · 2026-05-28T02:35:48Z

[subagent-4] ⚠️ API call failed (attempt 1/10): APIConnectionError
[subagent-4] 🔌 Provider: openai-codex Model: gpt-5.5
[subagent-4] 🌐 Endpoint: https://chatgpt.com/backend-api/codex
[subagent-4] 📝 Error: Connection error.
[subagent-4] ⏱️ Elapsed: 106.09s Context: 31 msgs, ~82,961 tokens
[subagent-4] ⏳ Retrying in 2.5s (attempt 1/10)...
✓ [4/5] excute Agent-D：code-review。 (537.61s)
[subagent-2] ⚠️ No response from provider for 600s (non-streaming, model: gpt-5.5). Codex backend appears to be silently rejecting 'gpt-5.5' on chatgpt.com/backend-api/codex (no stream events, no error). This is a known backend-side pattern that has affected ChatGPT Plus accounts intermittently. Workaround: try gpt-5.4 on the same OAuth profile, or gpt-5.3-codex, or switch to a different model/provider in your fallback chain. Some ChatGPT Codex accounts do not support gpt-5.4-codex. See hermes-agent#21444 for symptom history.
[subagent-2] ⚠️ API call failed (attempt 1/10): APIConnectionError
[subagent-2] 🔌 Provider: openai-codex Model: gpt-5.5
[subagent-2] 🌐 Endpoint: https://chatgpt.com/backend-api/codex
[subagent-2] 📝 Error: Connection error.
[subagent-2] ⏱️ Elapsed: 600.96s Context: 16 msgs, ~14,152 tokens
[subagent-2] ⏳ Retrying in 2.8s (attempt 1/10)...
[subagent-0] ⚠️ API call failed (attempt 1/10): APIConnectionError
[subagent-0] 🔌 Provider: openai-codex Model: gpt-5.5
[subagent-0] 🌐 Endpoint: https://chatgpt.com/backend-api/codex
[subagent-0] 📝 Error: Connection error.
[subagent-0] ⏱️ Elapsed: 601.48s Context: 21 msgs, ~27,239 tokens
[subagent-0] ⏳ Retrying in 2.2s (attempt 1/10)...

I update last git version ,but the situation has not improved

sanghyuk-seo-nexcube and others added 2 commits May 27, 2026 11:08

fix(agent): tolerate large codex stream prefill

e92954f

chore(release): map sanghyuk-seo-nexcube for #33383 salvage

aa29234

teknium1 merged commit 3476509 into main May 27, 2026
32 of 33 checks passed

teknium1 deleted the hermes/hermes-5bf34d29 branch May 27, 2026 18:19

teknium1 mentioned this pull request May 27, 2026

fix(agent): tolerate large Codex stream prefill #33383

Closed

14 tasks

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API labels May 27, 2026

github-actions Bot mentioned this pull request May 28, 2026

🦞 OpenClaw 生态日报 2026-05-28 ivanweng2077/big_model_radar#102

Open

zhonghui5207 mentioned this pull request May 28, 2026

openai-codex/gpt-5.5 still unstable in Hermes v0.14.0: subagents almost always hit APIConnectionError/TTFB timeout while Codex CLI works #33075

Closed

zqchris mentioned this pull request May 28, 2026

fix(codex): activity watchdog for codex_responses stale detector #32131

Closed

3 tasks

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): tolerate large codex stream prefill#33390

fix(agent): tolerate large codex stream prefill#33390
teknium1 merged 2 commits into
mainfrom
hermes/hermes-5bf34d29

teknium1 commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

yangguangjin commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

teknium1 commented May 27, 2026

Summary

How this fits the cluster

Changes

Validation

Attribution

Uh oh!

github-actions Bot commented May 27, 2026

🔎 Lint report: hermes/hermes-5bf34d29 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

yangguangjin commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

🔎 Lint report: `hermes/hermes-5bf34d29` vs `origin/main`