Auxiliary compression timeout can poison cached sync client, causing later auxiliary calls to fail

## Summary

When preflight context compression uses the main `openai-codex` auxiliary route and the Responses stream exceeds the configured auxiliary timeout, the timeout path closes the underlying auxiliary client. The sync auxiliary client can remain in the cache afterward, so later auxiliary calls reuse a closed/poisoned client and fail quickly with `Connection error`.

This affected context compression and memory/background auxiliary tasks in a long-running Discord gateway session. The main model route continued to work, so this does not look like a global network/auth outage.

## Observed Behavior

Live log sequence from a long-running gateway session:

```text
Preflight compression: ~234,406 tokens >= 217,600 threshold (model gpt-5.5, ctx 272,000)
context compression started: session=20260510_171239_804511 messages=255 tokens=~234,406 model=gpt-5.5 focus=None
Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Connection error.. Further summary attempts paused for 30 seconds.
context compression done: session=20260510_213719_f58dae messages=255->8 tokens=~27,120
```

Earlier in the same run there was a timeout on the same auxiliary path:

```text
Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Codex auxiliary Responses stream exceeded 20.0s total timeout. Further summary attempts paused for 30 seconds.
```

After that timeout, repeated downstream auxiliary tasks using `provider: main` started failing with:

```text
Auxiliary flush_memories: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Brainstack explicit capture validation extractor failed: Connection error.
```

Meanwhile normal main-agent calls to the same provider/model were succeeding in the same time window, including large requests around 168k-193k input tokens. That makes a stale/closed auxiliary client more likely than a global provider outage.

## Config Shape

Relevant config:

```yaml
model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex
auxiliary:
  compression:
    provider: main
    model: ""
    timeout: 20
```

No stale `stepfun/step-3.5-flash` route was involved in this observed failure. The compression task inherited the main model route as intended.

## Suspected Cause

The timeout handler in the Codex auxiliary Responses stream closes the real client on timeout. The sync cache path can still return the cached client later, because sync cache hits do not validate whether the cached client was previously closed/poisoned.

Relevant live-source areas inspected:

- `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` timeout path calls client close on timeout.
- `agent/auxiliary_client.py`: `_get_cached_client(...)` returns sync cached clients without a liveness check.
- `agent/context_compressor.py`: `_generate_summary(...)` calls `call_llm(task="compression", main_runtime=...)` and then records `_last_summary_error` when the auxiliary call fails.
- `run_agent.py`: preflight compression emits the user-facing fallback marker when `_last_summary_error` is set.

## Expected Behavior

After an auxiliary timeout or connection error:

- the poisoned/closed cached client should be evicted;
- the next auxiliary call should build a fresh client;
- compression should optionally retry once with a fresh client before inserting a fallback context marker;
- downstream auxiliary memory/background tasks should not inherit the broken cached client state.

## Suggested Fix

At minimum:

1. On timeout/connection failure from a cached sync auxiliary client, evict that cache entry before raising.
2. Add a liveness/closed-state guard for sync cached clients, similar in spirit to the existing async loop validation.
3. Add a regression test where:
   - auxiliary compression times out and closes the wrapped client;
   - a later `provider: main` auxiliary call is made;
   - the later call must create/use a fresh client rather than reusing the closed one.

## Impact

This can cause context compression to drop middle turns into a static fallback marker even though the main model route is still healthy. It can also make memory/background auxiliary tasks appear broken after a single auxiliary timeout.

## Notes

I am reporting this as a Hermes auxiliary-client/runtime issue, not as a memory-provider storage issue. The memory provider was only the downstream consumer that made the poisoned cached auxiliary route visible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auxiliary compression timeout can poison cached sync client, causing later auxiliary calls to fail #23432

Summary

Observed Behavior

Config Shape

Suspected Cause

Expected Behavior

Suggested Fix

Impact

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Auxiliary compression timeout can poison cached sync client, causing later auxiliary calls to fail #23432

Description

Summary

Observed Behavior

Config Shape

Suspected Cause

Expected Behavior

Suggested Fix

Impact

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions