openai-codex/gpt-5.5 still unstable in Hermes v0.14.0: subagents almost always hit APIConnectionError/TTFB timeout while Codex CLI works

## Summary

`openai-codex` / `gpt-5.5` is still highly unstable in Hermes Agent on the current v0.14.0 release, while the official Codex CLI remains usable on the same Windows machine, same network, and same ChatGPT/Codex login family.

This is a fresh late-May reproduction, not just a follow-up to the old April thread. The older report (#13834) has been open since April and the problem still severely affects real usage near June.

The failure is especially severe with Hermes subagents/delegation:

- Main Hermes agent: roughly ~50% chance of hitting this failure family in normal usage.
- Hermes subagents: nearly 100% chance when several subagents are running concurrently.
- Official Codex CLI on the same host/network/login remains usable for normal interaction.

This makes Hermes delegation almost unusable with `openai-codex` / `gpt-5.5`.

## Environment

- OS: Windows 10, Git Bash/MSYS terminal
  - `MINGW64_NT-10.0-26200`
- Hermes Agent: `v0.14.0 (2026.5.16)`
- Hermes project path shown by version output: `C:\Users\yangg\AppData\Local\hermes\hermes-agent`
- Python shown by Hermes: `3.11.13`
- OpenAI SDK shown by Hermes: `2.24.0`
- Official Codex CLI: `codex-cli 0.134.0`
- GitHub CLI: `gh version 2.92.0`

Sanitized Hermes auth status:

```text
openai-codex (1 credentials):
  #1  openai-codex-oauth-1 oauth   device_code ←
```

Sanitized relevant Hermes config:

```yaml
model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex
  context_length: 1000000

delegation:
  provider: openai-codex
  model: gpt-5.5
  context_length: 1000000
  api_mode: codex_responses
  max_iterations: 50
  child_timeout_seconds: 3600
  max_concurrent_children: 10
  max_spawn_depth: 1
  orchestrator_enabled: true

agent:
  api_max_retries: 10
  reasoning_effort: xhigh
```

## Expected behavior

If official Codex CLI can complete normal prompts on the same machine/network/account family, Hermes `openai-codex` should also be able to complete normal main-agent and subagent requests reliably, or at least fail in a structured way that does not make delegation unusable.

In particular:

- Subagents should not almost always fail before producing a response.
- The retry loop should not amplify connection/TTFB failures into repeated stalls and eventual 429s.
- Hermes' Codex transport should behave close enough to official Codex CLI that the same host/network/login does not show a massive reliability gap.

## Actual behavior

Hermes frequently fails against:

```text
https://chatgpt.com/backend-api/codex
```

Observed failure modes in the same run:

- `APIConnectionError` after ~16-22 seconds.
- `No first byte from provider in 45s` / TTFB watchdog timeout.
- `TimeoutError: Codex stream produced no bytes within 45s`.
- HTTP 429 rate limit after many concurrent retries.

The 429 looks like a secondary amplification effect: many subagent requests fail/stall, retry concurrently, and then hit rate limiting. The primary reliability problem appears to happen before 429: Codex requests frequently fail or stall before first byte.

## Fresh sanitized log sample

```text
[subagent-0] API call failed (attempt 1/10): APIConnectionError
[subagent-0] Provider: openai-codex  Model: gpt-5.5
[subagent-0] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-0] Error: Connection error.
[subagent-0] Elapsed: 16.24s  Context: 10 msgs, ~8,607 tokens
[subagent-0] Retrying in 2.5s (attempt 1/10)...

[subagent-3] API call failed (attempt 1/10): APIConnectionError
[subagent-3] Provider: openai-codex  Model: gpt-5.5
[subagent-3] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-3] Error: Connection error.
[subagent-3] Elapsed: 16.90s  Context: 18 msgs, ~45,777 tokens
[subagent-3] Retrying in 2.8s (attempt 1/10)...

[subagent-3] API call failed (attempt 1/10): APIConnectionError
[subagent-3] Provider: openai-codex  Model: gpt-5.5
[subagent-3] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-3] Error: Connection error.
[subagent-3] Elapsed: 22.57s  Context: 24 msgs, ~49,033 tokens
[subagent-3] Retrying in 2.7s (attempt 1/10)...

[subagent-1] API call failed (attempt 1/10): APIConnectionError
[subagent-1] Provider: openai-codex  Model: gpt-5.5
[subagent-1] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-1] Error: Connection error.
[subagent-1] Elapsed: 16.38s  Context: 52 msgs, ~88,056 tokens
[subagent-1] Retrying in 2.9s (attempt 1/10)...

[subagent-3] No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
[subagent-3] API call failed (attempt 1/10): TimeoutError
[subagent-3] Provider: openai-codex  Model: gpt-5.5
[subagent-3] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-3] Error: Codex stream produced no bytes within 45s (TTFB threshold: 45s)
[subagent-3] Elapsed: 47.18s  Context: 32 msgs, ~59,753 tokens
[subagent-3] Retrying in 2.8s (attempt 1/10)...

[subagent-0] API call failed (attempt 1/10): APIConnectionError
[subagent-0] Provider: openai-codex  Model: gpt-5.5
[subagent-0] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-0] Error: Connection error.
[subagent-0] Elapsed: 20.66s  Context: 67 msgs, ~109,001 tokens
[subagent-0] Retrying in 2.4s (attempt 1/10)...

[subagent-3] No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
[subagent-3] API call failed (attempt 1/10): TimeoutError
[subagent-3] Provider: openai-codex  Model: gpt-5.5
[subagent-3] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-3] Error: Codex stream produced no bytes within 45s (TTFB threshold: 45s)
[subagent-3] Elapsed: 47.20s  Context: 42 msgs, ~68,758 tokens
[subagent-3] Retrying in 2.3s (attempt 1/10)...

[subagent-2] No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
[subagent-2] API call failed (attempt 1/10): TimeoutError
[subagent-2] Provider: openai-codex  Model: gpt-5.5
[subagent-2] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-2] Error: Codex stream produced no bytes within 45s (TTFB threshold: 45s)
[subagent-2] Elapsed: 47.16s  Context: 70 msgs, ~98,074 tokens
[subagent-2] Retrying in 2.0s (attempt 1/10)...

[subagent-1] API call failed (attempt 1/10): APIConnectionError
[subagent-1] Provider: openai-codex  Model: gpt-5.5
[subagent-1] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-1] Error: Connection error.
[subagent-1] Elapsed: 17.62s  Context: 76 msgs, ~144,687 tokens
[subagent-1] Retrying in 2.8s (attempt 1/10)...

[subagent-0] API call failed (attempt 1/10): RateLimitError [HTTP 429]
[subagent-0] Provider: openai-codex  Model: gpt-5.5
[subagent-0] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-0] Error: HTTP 429: Error code: 429 - {'detail': 'Rate limit exceeded'}
[subagent-0] Details: {'detail': 'Rate limit exceeded'}
[subagent-0] Elapsed: 3.23s  Context: 88 msgs, ~136,681 tokens
[subagent-0] Rate limited. Waiting 1.0s (attempt 2/10)...

[subagent-1] No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
[subagent-1] API call failed (attempt 1/10): TimeoutError
[subagent-1] Provider: openai-codex  Model: gpt-5.5
[subagent-1] Endpoint: https://chatgpt.com/backend-api/codex
[subagent-1] Error: Codex stream produced no bytes within 45s (TTFB threshold: 45s)
[subagent-1] Elapsed: 47.03s  Context: 100 msgs, ~180,332 tokens
[subagent-1] Retrying in 2.5s (attempt 1/10)...
```

## Why this seems Hermes-specific or Hermes-amplified

The official Codex CLI remains usable on the same host/network/account family, but Hermes' `openai-codex` OAuth path becomes unreliable, especially under delegation/concurrency.

The old April report (#13834) already described the same general gap: official Codex CLI works while Hermes fails against the Codex backend. This fresh reproduction shows the issue is still present on the current v0.14.0 release and is severe enough to break subagent workflows.

## Impact

This is not a minor intermittent warning. It severely affects normal Hermes usage:

- Main agent randomly becomes unreliable.
- Subagent/delegation workflows are almost unusable with Codex OAuth.
- Retrying many concurrent failed subagent calls can amplify into 429s.
- Long stalls and retries make the user experience very poor.

## Related issues / context

- Older broad report from April: #13834
- Related closed TTFB/no-first-byte issue: #32373
- Mentioned fix in prior thread: #32963
- Possible related newer work mentioned by another user: #33042

This new issue is filed because the older April report is still unresolved in practice near June, and the failure still reproduces on current Hermes v0.14.0.

## Questions / possible areas to inspect

1. Are Hermes subagents creating independent Codex clients/transports in a way that differs from the main agent and/or official Codex CLI?
2. Is Hermes' `openai-codex` path missing a concurrency limiter or smarter backoff policy for delegation?
3. Does the Codex Responses transport differ from official Codex CLI in connection reuse, websocket/SSE handling, headers, session affinity, Cloudflare/browser-like behavior, or request payload shape?
4. Could Hermes mirror the official Codex CLI runtime path more closely for primary and concurrent agent calls?
5. Can Hermes detect this silent/TTFB failure family earlier and prevent concurrent retries from self-amplifying into 429s?

I can run additional diagnostics if maintainers have a recommended way to compare Hermes' Codex OAuth transport against official Codex CLI behavior without exposing tokens.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai-codex/gpt-5.5 still unstable in Hermes v0.14.0: subagents almost always hit APIConnectionError/TTFB timeout while Codex CLI works #33075

Summary

Environment

Expected behavior

Actual behavior

Fresh sanitized log sample

Why this seems Hermes-specific or Hermes-amplified

Impact

Related issues / context

Questions / possible areas to inspect

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

openai-codex/gpt-5.5 still unstable in Hermes v0.14.0: subagents almost always hit APIConnectionError/TTFB timeout while Codex CLI works #33075

Description

Summary

Environment

Expected behavior

Actual behavior

Fresh sanitized log sample

Why this seems Hermes-specific or Hermes-amplified

Impact

Related issues / context

Questions / possible areas to inspect

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions