Skip to content

fix(codex): add TTFB watchdog for stalled Codex Responses streams (#31984)#32042

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-f0f87402
May 25, 2026
Merged

fix(codex): add TTFB watchdog for stalled Codex Responses streams (#31984)#32042
teknium1 merged 2 commits into
mainfrom
hermes/hermes-f0f87402

Conversation

@teknium1

@teknium1 teknium1 commented May 25, 2026

Copy link
Copy Markdown
Contributor

Salvage of #31984 by @adam91holt onto current main. Cherry-pick was clean on top of today's #29507 + Kasun's Codex-timeout merge.

What this PR does

Adds a time-to-first-byte (TTFB) watchdog on the codex_responses streaming path. While zero stream events have arrived, a much shorter cutoff (HERMES_CODEX_TTFB_TIMEOUT_SECONDS, default 45s, 0 disables) kills the connection so the existing retry loop can reconnect. Once any event arrives the stream is healthy and only the wall-clock stale timeout applies — long reasoning generations are never interrupted.

Why

chatgpt.com/backend-api/codex sometimes accepts the TCP/TLS connection but never emits a single stream event. The wall-clock stale timeout has to stay high (180–900s) to cover legitimate long generations, so a dead-on-arrival connection burned that whole window before retry. A fresh reconnect typically succeeds in ~2s.

Reported in community by @adam91holt with direct backend probing — sequential probes hit the no-first-byte hang, immediate reconnects succeed.

How

  • agent/codex_runtime.py — stamp agent._codex_stream_last_event_ts on every Responses event (any event, so reasoning-only and tool-call-only turns are not mistaken for a stall).
  • agent/chat_completion_helpers.py — TTFB watchdog in interruptible_api_call, gated to api_mode == 'codex_responses'. Before the first event, kill + retry once elapsed exceeds the cutoff; after the first event, behavior is unchanged.
  • tests/agent/test_codex_ttfb_watchdog.py — 3 regression tests (kill on no-event, no-kill when events flow, env-disable).

Scope rationale: this is codex-only by design. The chat_completions streaming path already has HERMES_STREAM_READ_TIMEOUT (default 120s) wired into httpx.Timeout(read=...) plus an in-loop stale-chunk detector. Non-stream paths (chat_completions non-stream, anthropic_messages, bedrock_converse) have no intermediate event signal — TTFB is structurally equivalent to the full stale timeout there. So the gap is just the codex stream path, and that's what this fills.

Validation

tests/agent/test_codex_ttfb_watchdog.py                        3 passed
tests/agent/test_non_stream_stale_timeout.py                  12 passed
tests/run_agent/test_run_agent_codex_responses.py             65 passed
tests/run_agent/test_streaming.py                             40 passed
tests/run_agent/test_interrupt_propagation.py                  7 passed
tests/run_agent/test_openai_client_lifecycle.py                5 passed
tests/run_agent/test_codex_xai_oauth_recovery.py              37 passed
------------------------------------------------------------------------
                                                             169 passed

Closes #31984. Authorship preserved via cherry-pick.

Infographic

ttfb-watchdog

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder codex labels May 25, 2026
@hclsys

hclsys commented May 25, 2026

Copy link
Copy Markdown

I reviewed the original #31984 in detail, so confirming this salvage is faithful: the cherry-pick preserves the load-bearing parts — codex_runtime still sets agent._codex_stream_last_event_ts = time.time() on each event (so is None genuinely means 'zero events'), the reset runs before the worker starts (no stale-marker false-positive), gating stays api_mode == "codex_responses" only, and tests/agent/test_codex_ttfb_watchdog.py came across intact. Good etiquette crediting @adam91holt + linking #31984, and the AUTHOR_MAP entry in scripts/release.py is the right housekeeping for a salvage.

Carrying forward the one open note from my #31984 review so it isn't lost in the rebase (still non-blocking): the None-sentinel reset (agent._codex_stream_last_event_ts = None before the worker starts) assumes single-flight per agent — if two _call()s ever overlapped on the same agent, a previous call's worker could set the marker between this call's reset and worker start, masking a real stall. That's not how the turn loop drives it today, so it's theoretical, but a one-line comment noting the single-flight assumption would help a future reader. Logic is sound and unchanged from the version I verified on #31984 — LGTM for the salvage.

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-f0f87402 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9325 on HEAD, 9320 on base (🆕 +5)

🆕 New issues (3):

Rule Count
unresolved-attribute 1
unresolved-import 1
no-matching-overload 1
First entries
tests/agent/test_codex_ttfb_watchdog.py:50: [unresolved-attribute] unresolved-attribute: Unresolved attribute `api_mode` on type `AIAgent`
tests/agent/test_codex_ttfb_watchdog.py:24: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/agent/test_codex_ttfb_watchdog.py:29: [no-matching-overload] no-matching-overload: No overload of bound method `MutableMapping.setdefault` matches arguments

✅ Fixed issues: none

Unchanged: 4931 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

adam91holt and others added 2 commits May 25, 2026 05:23
The chatgpt.com/backend-api/codex endpoint has an intermittent failure mode
where it accepts the connection but never emits a single stream event — the
socket just hangs. Direct sequential probing reproduces it (0 events, no HTTP
status), and a fresh reconnect then succeeds in ~2s. Today the only guard is
the wall-clock stale timeout in interruptible_api_call, so a dead-on-arrival
connection is held for the full stale window (90-900s depending on context /
config) before the retry loop can reconnect — minutes of wasted wall time per
stall, at a rate of ~20% of calls during affected windows.

Add a TTFB watchdog scoped to the codex_responses path:

- codex_runtime.run_codex_stream stamps agent._codex_stream_last_event_ts on
  *every* stream event (not just output-text deltas), so reasoning-only and
  tool-call-only turns are not mistaken for a stall.
- interruptible_api_call resets that marker before the worker starts and, while
  it is still None, kills the connection once elapsed exceeds the TTFB cutoff
  (default 45s, tunable via HERMES_CODEX_TTFB_TIMEOUT_SECONDS, 0 disables). The
  raised TimeoutError flows through the existing retry path unchanged.

Once any event has arrived the stream is healthy and only the existing
wall-clock stale timeout applies, so legitimate long generations are never
interrupted. Gated to codex_responses; the chat_completions non-stream,
anthropic and bedrock branches have no first-event signal and are untouched.

Adds tests/agent/test_codex_ttfb_watchdog.py covering the stall kill, the
events-flowing pass-through, and the env-disable path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1 teknium1 force-pushed the hermes/hermes-f0f87402 branch from 41ef21d to 20e45ab Compare May 25, 2026 12:24
@teknium1 teknium1 merged commit 2b16de0 into main May 25, 2026
24 of 26 checks passed
@teknium1 teknium1 deleted the hermes/hermes-f0f87402 branch May 25, 2026 12:34
teknium1 added a commit that referenced this pull request May 25, 2026
…32072)

All four failures were broken by the security cluster (#10082 / #10133 /
#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when #32042 and #32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
daletkc pushed a commit to daletkc/hermes-agent that referenced this pull request May 25, 2026
…ousResearch#32072)

All four failures were broken by the security cluster (NousResearch#10082 / NousResearch#10133 /
NousResearch#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when NousResearch#32042 and NousResearch#32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
bridge25 pushed a commit to bridge25/hermes-agent that referenced this pull request May 27, 2026
…ousResearch#32072)

All four failures were broken by the security cluster (NousResearch#10082 / NousResearch#10133 /
NousResearch#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when NousResearch#32042 and NousResearch#32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
mathias3 pushed a commit to mathias3/hermes-agent that referenced this pull request May 28, 2026
…ousResearch#32072)

All four failures were broken by the security cluster (NousResearch#10082 / NousResearch#10133 /
NousResearch#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when NousResearch#32042 and NousResearch#32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…ousResearch#32072)

All four failures were broken by the security cluster (NousResearch#10082 / NousResearch#10133 /
NousResearch#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when NousResearch#32042 and NousResearch#32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
#AI commit#
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
…ousResearch#32072)

All four failures were broken by the security cluster (NousResearch#10082 / NousResearch#10133 /
NousResearch#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when NousResearch#32042 and NousResearch#32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#32072)

All four failures were broken by the security cluster (NousResearch#10082 / NousResearch#10133 /
NousResearch#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when NousResearch#32042 and NousResearch#32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants