Skip to content

fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern#32016

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-f0bc0758
May 25, 2026
Merged

fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern#32016
teknium1 merged 1 commit into
mainfrom
hermes/hermes-f0bc0758

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

When the stale-call detector fires on a known Codex silent-reject pattern (gpt-5.5 family on chatgpt.com/backend-api/codex), the user now sees actionable text instead of "Aborting call." Names the gpt-5.4-codex workaround and points at #21444 for symptom history.

Companion to PR #31967 (which lowered the implicit stale-call default from 300s to 90s). Together: fallbacks kick in faster, and when they do the user knows what to do about it.

Closes #22046.

Salvage notes

PR #22046 (@Tranquil-Flow) was based on a branch ~2 weeks stale against main, so direct cherry-pick would have reverted unrelated fixes (#29507 stranger-thread close protection, natural-ending emoji/caret, xAI disambiguator #29344, redact-sensitive-text import). Only the substantive contribution — the _codex_silent_hang_hint helper plus the call-site hook — was salvaged onto current main. Authorship preserved on the commit.

Wording was updated: the original PR linked openai/codex#19654 which closed May 1, 2026. The new hint text instead points at hermes-agent#21444 for symptom history and recommends the gpt-5.4-codex workaround generally rather than referencing a now-closed upstream issue.

Changes

  • run_agent.py — new AIAgent._codex_silent_hang_hint(model=...) method. Returns None unless all three guards match:
    • api_mode == "codex_responses"
    • provider is openai-codex OR base URL is chatgpt.com/backend-api/codex
    • model name matches gpt-5.5 family via word-boundary regex (guards against false-positive on gpt-5.50)
  • agent/chat_completion_helpers.py — non-stream stale-call site consults the hint via getattr(...) for robustness. Hint is appended to both _emit_status (terminal warning) and TimeoutError message (retry-loop diagnostics).
  • tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests covering 4 positive + 6 negative cases.

Validation

result
Targeted: new hint tests + non-stream stale timeout + codex_responses + transport + client lifecycle + cli timeouts 148 passed, 1 skipped
E2E: live AIAgent build, hint fires for gpt-5.5/openai-codex, silent for gpt-5.4-codex workaround, silent for openrouter route

Infographic

codex-silent-hang-hint

… known silent-reject pattern

The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically
silently dropped certain model requests: the connection is accepted but no
stream events are emitted and no error is raised. PR #31967 lowered the
implicit stale-call default from 300s to 90s so fallbacks kick in faster,
but users still see an opaque "No response from provider for 90s
(non-streaming, ...)" message that gives no path forward.

This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend
via codex_responses api_mode — that substitutes the generic timeout
message with actionable text naming the gpt-5.4-codex workaround and
pointing at #21444 for symptom history.

Changes:

- run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method.
  Returns ``None`` for any request that does not match all three guards
  (codex_responses api_mode, openai-codex provider or chatgpt.com Codex
  base URL, gpt-5.5-family model name with word-boundary regex anchoring
  to avoid false-positives on e.g. ``gpt-5.50``).
- agent/chat_completion_helpers.py — the non-stream stale-call site
  consults the hint via ``getattr(...)`` so the call site stays robust
  if the helper is ever removed or stubbed in tests. Hint is appended to
  both the ``_emit_status`` warning and the ``TimeoutError`` message so
  the user sees it in their terminal AND it lands in any retry-loop
  diagnostics.
- tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests
  covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5,
  gpt-5.5-codex SKU, model=None fallback to self.model) and negative
  cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard,
  non-codex api_mode, non-codex provider, empty/None model, unrelated
  models on Codex).

Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT
problem we cannot patch from here). Only converts an opaque timeout into
text that names the workaround so users do not have to dig through logs
or wait for a forum post to learn what to do.

Closes #22046
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-f0bc0758 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9311 on HEAD, 9256 on base (🆕 +55)

🆕 New issues (14):

Rule Count
invalid-argument-type 10
unresolved-attribute 3
unresolved-import 1
First entries
run_agent.py:955: [unresolved-attribute] unresolved-attribute: Object of type `Self@_codex_silent_hang_hint` has no attribute `provider`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `dict[str, Any]`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `int | float | None`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `int | float`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `list[str]`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `str`, found `str | bool`
run_agent.py:952: [unresolved-attribute] unresolved-attribute: Object of type `Self@_codex_silent_hang_hint` has no attribute `api_mode`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `IterationBudget`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `bool`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `int`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:13: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
run_agent.py:963: [unresolved-attribute] unresolved-attribute: Object of type `Self@_codex_silent_hang_hint` has no attribute `model`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `list[str] | None`, found `str | bool`
tests/run_agent/test_codex_silent_hang_hint.py:29: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `list[dict[str, Any]]`, found `str | bool`

✅ Fixed issues: none

Unchanged: 4909 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit b1adb95 into main May 25, 2026
25 of 26 checks passed
@teknium1 teknium1 deleted the hermes/hermes-f0bc0758 branch May 25, 2026 11:49
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder codex labels May 25, 2026
robintong pushed a commit to robintong/hermes-agent that referenced this pull request May 27, 2026
Replaying a batch of prior codex_reasoning_items into the Responses input
hangs the chatgpt.com/backend-api/codex backend on a cold prompt cache:
the SSE stream emits no first event and the call dies at the stale
timeout (issue NousResearch#21444 / NousResearch#11179 family). Isolated 2026-05-27 on a
198-message resume: full reasoning replay -> 80s+ hang; drop reasoning
-> ~2s; summary-only (encrypted_content stripped, items kept) still hangs
-> so it is the reasoning items in the input, not just the encrypted
blob. Upstream only mitigated (PR NousResearch#31967 fails faster; PR NousResearch#32016 hints at
gpt-5.4-codex, unavailable on a ChatGPT account).

Gate the codex reasoning replay behind _codex_replay_reasoning_enabled()
(default off; HERMES_CODEX_REPLAY_REASONING=1 restores it). Mirrors the
existing is_xai_responses strip. The model re-reasons from visible
history each turn; summaries remain in the saved session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants