Summary
When preflight context compression uses the main openai-codex auxiliary route and the Responses stream exceeds the configured auxiliary timeout, the timeout path closes the underlying auxiliary client. The sync auxiliary client can remain in the cache afterward, so later auxiliary calls reuse a closed/poisoned client and fail quickly with Connection error.
This affected context compression and memory/background auxiliary tasks in a long-running Discord gateway session. The main model route continued to work, so this does not look like a global network/auth outage.
Observed Behavior
Live log sequence from a long-running gateway session:
Preflight compression: ~234,406 tokens >= 217,600 threshold (model gpt-5.5, ctx 272,000)
context compression started: session=20260510_171239_804511 messages=255 tokens=~234,406 model=gpt-5.5 focus=None
Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Connection error.. Further summary attempts paused for 30 seconds.
context compression done: session=20260510_213719_f58dae messages=255->8 tokens=~27,120
Earlier in the same run there was a timeout on the same auxiliary path:
Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Codex auxiliary Responses stream exceeded 20.0s total timeout. Further summary attempts paused for 30 seconds.
After that timeout, repeated downstream auxiliary tasks using provider: main started failing with:
Auxiliary flush_memories: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Brainstack explicit capture validation extractor failed: Connection error.
Meanwhile normal main-agent calls to the same provider/model were succeeding in the same time window, including large requests around 168k-193k input tokens. That makes a stale/closed auxiliary client more likely than a global provider outage.
Config Shape
Relevant config:
model:
default: gpt-5.5
provider: openai-codex
base_url: https://chatgpt.com/backend-api/codex
auxiliary:
compression:
provider: main
model: ""
timeout: 20
No stale stepfun/step-3.5-flash route was involved in this observed failure. The compression task inherited the main model route as intended.
Suspected Cause
The timeout handler in the Codex auxiliary Responses stream closes the real client on timeout. The sync cache path can still return the cached client later, because sync cache hits do not validate whether the cached client was previously closed/poisoned.
Relevant live-source areas inspected:
agent/auxiliary_client.py: _CodexCompletionsAdapter timeout path calls client close on timeout.
agent/auxiliary_client.py: _get_cached_client(...) returns sync cached clients without a liveness check.
agent/context_compressor.py: _generate_summary(...) calls call_llm(task="compression", main_runtime=...) and then records _last_summary_error when the auxiliary call fails.
run_agent.py: preflight compression emits the user-facing fallback marker when _last_summary_error is set.
Expected Behavior
After an auxiliary timeout or connection error:
- the poisoned/closed cached client should be evicted;
- the next auxiliary call should build a fresh client;
- compression should optionally retry once with a fresh client before inserting a fallback context marker;
- downstream auxiliary memory/background tasks should not inherit the broken cached client state.
Suggested Fix
At minimum:
- On timeout/connection failure from a cached sync auxiliary client, evict that cache entry before raising.
- Add a liveness/closed-state guard for sync cached clients, similar in spirit to the existing async loop validation.
- Add a regression test where:
- auxiliary compression times out and closes the wrapped client;
- a later
provider: main auxiliary call is made;
- the later call must create/use a fresh client rather than reusing the closed one.
Impact
This can cause context compression to drop middle turns into a static fallback marker even though the main model route is still healthy. It can also make memory/background auxiliary tasks appear broken after a single auxiliary timeout.
Notes
I am reporting this as a Hermes auxiliary-client/runtime issue, not as a memory-provider storage issue. The memory provider was only the downstream consumer that made the poisoned cached auxiliary route visible.
Summary
When preflight context compression uses the main
openai-codexauxiliary route and the Responses stream exceeds the configured auxiliary timeout, the timeout path closes the underlying auxiliary client. The sync auxiliary client can remain in the cache afterward, so later auxiliary calls reuse a closed/poisoned client and fail quickly withConnection error.This affected context compression and memory/background auxiliary tasks in a long-running Discord gateway session. The main model route continued to work, so this does not look like a global network/auth outage.
Observed Behavior
Live log sequence from a long-running gateway session:
Earlier in the same run there was a timeout on the same auxiliary path:
After that timeout, repeated downstream auxiliary tasks using
provider: mainstarted failing with:Meanwhile normal main-agent calls to the same provider/model were succeeding in the same time window, including large requests around 168k-193k input tokens. That makes a stale/closed auxiliary client more likely than a global provider outage.
Config Shape
Relevant config:
No stale
stepfun/step-3.5-flashroute was involved in this observed failure. The compression task inherited the main model route as intended.Suspected Cause
The timeout handler in the Codex auxiliary Responses stream closes the real client on timeout. The sync cache path can still return the cached client later, because sync cache hits do not validate whether the cached client was previously closed/poisoned.
Relevant live-source areas inspected:
agent/auxiliary_client.py:_CodexCompletionsAdaptertimeout path calls client close on timeout.agent/auxiliary_client.py:_get_cached_client(...)returns sync cached clients without a liveness check.agent/context_compressor.py:_generate_summary(...)callscall_llm(task="compression", main_runtime=...)and then records_last_summary_errorwhen the auxiliary call fails.run_agent.py: preflight compression emits the user-facing fallback marker when_last_summary_erroris set.Expected Behavior
After an auxiliary timeout or connection error:
Suggested Fix
At minimum:
provider: mainauxiliary call is made;Impact
This can cause context compression to drop middle turns into a static fallback marker even though the main model route is still healthy. It can also make memory/background auxiliary tasks appear broken after a single auxiliary timeout.
Notes
I am reporting this as a Hermes auxiliary-client/runtime issue, not as a memory-provider storage issue. The memory provider was only the downstream consumer that made the poisoned cached auxiliary route visible.