Skip to content

Auxiliary compression timeout can poison cached sync client, causing later auxiliary calls to fail #23432

@yepyhun

Description

@yepyhun

Summary

When preflight context compression uses the main openai-codex auxiliary route and the Responses stream exceeds the configured auxiliary timeout, the timeout path closes the underlying auxiliary client. The sync auxiliary client can remain in the cache afterward, so later auxiliary calls reuse a closed/poisoned client and fail quickly with Connection error.

This affected context compression and memory/background auxiliary tasks in a long-running Discord gateway session. The main model route continued to work, so this does not look like a global network/auth outage.

Observed Behavior

Live log sequence from a long-running gateway session:

Preflight compression: ~234,406 tokens >= 217,600 threshold (model gpt-5.5, ctx 272,000)
context compression started: session=20260510_171239_804511 messages=255 tokens=~234,406 model=gpt-5.5 focus=None
Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Connection error.. Further summary attempts paused for 30 seconds.
context compression done: session=20260510_213719_f58dae messages=255->8 tokens=~27,120

Earlier in the same run there was a timeout on the same auxiliary path:

Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Codex auxiliary Responses stream exceeded 20.0s total timeout. Further summary attempts paused for 30 seconds.

After that timeout, repeated downstream auxiliary tasks using provider: main started failing with:

Auxiliary flush_memories: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Brainstack explicit capture validation extractor failed: Connection error.

Meanwhile normal main-agent calls to the same provider/model were succeeding in the same time window, including large requests around 168k-193k input tokens. That makes a stale/closed auxiliary client more likely than a global provider outage.

Config Shape

Relevant config:

model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex
auxiliary:
  compression:
    provider: main
    model: ""
    timeout: 20

No stale stepfun/step-3.5-flash route was involved in this observed failure. The compression task inherited the main model route as intended.

Suspected Cause

The timeout handler in the Codex auxiliary Responses stream closes the real client on timeout. The sync cache path can still return the cached client later, because sync cache hits do not validate whether the cached client was previously closed/poisoned.

Relevant live-source areas inspected:

  • agent/auxiliary_client.py: _CodexCompletionsAdapter timeout path calls client close on timeout.
  • agent/auxiliary_client.py: _get_cached_client(...) returns sync cached clients without a liveness check.
  • agent/context_compressor.py: _generate_summary(...) calls call_llm(task="compression", main_runtime=...) and then records _last_summary_error when the auxiliary call fails.
  • run_agent.py: preflight compression emits the user-facing fallback marker when _last_summary_error is set.

Expected Behavior

After an auxiliary timeout or connection error:

  • the poisoned/closed cached client should be evicted;
  • the next auxiliary call should build a fresh client;
  • compression should optionally retry once with a fresh client before inserting a fallback context marker;
  • downstream auxiliary memory/background tasks should not inherit the broken cached client state.

Suggested Fix

At minimum:

  1. On timeout/connection failure from a cached sync auxiliary client, evict that cache entry before raising.
  2. Add a liveness/closed-state guard for sync cached clients, similar in spirit to the existing async loop validation.
  3. Add a regression test where:
    • auxiliary compression times out and closes the wrapped client;
    • a later provider: main auxiliary call is made;
    • the later call must create/use a fresh client rather than reusing the closed one.

Impact

This can cause context compression to drop middle turns into a static fallback marker even though the main model route is still healthy. It can also make memory/background auxiliary tasks appear broken after a single auxiliary timeout.

Notes

I am reporting this as a Hermes auxiliary-client/runtime issue, not as a memory-provider storage issue. The memory provider was only the downstream consumer that made the poisoned cached auxiliary route visible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions