Bug: auxiliary_client._client_cache accumulates AsyncOpenAI clients indefinitely, causing fd exhaustion in long-running gateway

## Summary

`auxiliary_client._client_cache` (line ~1685) accumulates `AsyncOpenAI` client entries indefinitely in long-running gateway processes. Each cached async client holds an `httpx.AsyncClient` bound to a specific event loop, which keeps the loop's kqueue selector and self-pipe unix sockets alive. Over days of operation, this exhausts the process file descriptor limit.

This is a deeper root cause than #8043, which addressed `model_tools.py` event loop cleanup but did not address the `_client_cache` accumulation.

## Root Cause

The cache key for async clients includes `id(asyncio.get_event_loop())` (line ~1817):

```python
cache_key = (provider, async_mode, base_url, api_key, loop_id)
```

When worker threads are recycled by `ThreadPoolExecutor` (e.g., during cron job execution or gateway message handling), new threads get new event loops with new `loop_id` values. Each unique `loop_id` creates a new cache entry with a new `AsyncOpenAI` client. Old entries are never evicted — `cleanup_stale_async_clients()` (line ~1769) only removes entries whose loop `.is_closed()`, but the cached client itself holds a reference to the loop, preventing it from being closed.

## Reproduction

```python
import asyncio, concurrent.futures, gc, os, subprocess
from agent.auxiliary_client import _client_cache, _client_cache_lock, _get_cached_client

pid = os.getpid()

def fd_count():
    r = subprocess.run(["lsof", "-p", str(pid)], capture_output=True, text=True)
    lines = r.stdout.strip().split("\n")[1:]
    kq = sum(1 for l in lines if "KQUEUE" in l)
    return len(lines), kq

print(f"Before: total={fd_count()[0]}, KQUEUE={fd_count()[1]}")

for i in range(10):
    pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
    def run():
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        loop.run_until_complete(asyncio.sleep(0))
        _get_cached_client("custom", "test", async_mode=True,
                           base_url="https://example.com/v1", api_key="key")
    pool.submit(run).result()
    pool.shutdown(wait=True)

gc.collect()
print(f"After: total={fd_count()[0]}, KQUEUE={fd_count()[1]}")
# Result: 10 KQUEUE fds leaked (1 per unique loop_id)

with _client_cache_lock:
    print(f"Cache entries: {len(_client_cache)}")  # 10 entries, never cleaned
```

## Observed Impact

On a macOS gateway running for ~4 days with 6 daily cron jobs + interactive chat:
- **56 KQUEUE** fds (one per leaked event loop)
- **113 unix socket** fds (self-pipe pairs, 2 per loop)
- **67 IPv4** fds (httpx connection pools)
- **Total: 323 fds** — exceeded macOS `launchctl limit maxfiles` soft limit of 256
- All cron deliveries and new connections failed with `[Errno 24] Too many open files`

## Suggested Fix

1. **LRU/TTL eviction for `_client_cache`**: Cap the number of async client cache entries (e.g., 16) and close evicted clients explicitly.
2. **Thread-aware cleanup**: In `cleanup_stale_async_clients()`, also check whether the thread that created each cached loop is still alive. If the thread is dead, close the client and remove the entry.
3. **Periodic cleanup in gateway**: Call `cleanup_stale_async_clients()` periodically from the cron ticker or a dedicated cleanup task, not just after agent turns.

## Environment

- macOS 15, Python 3.11
- hermes-agent latest main (post-#8043 merge)
- Gateway mode with POPO + WeChat adapters, 6 cron jobs

## Related

- #8043 — addressed `model_tools.py` event loop cleanup (partial fix, does not cover `_client_cache`)
- #8073 — PR for #8043 (reviewer noted `_worker_thread_local` cleanup was missing)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: auxiliary_client._client_cache accumulates AsyncOpenAI clients indefinitely, causing fd exhaustion in long-running gateway #10200

Summary

Root Cause

Reproduction

Observed Impact

Suggested Fix

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: auxiliary_client._client_cache accumulates AsyncOpenAI clients indefinitely, causing fd exhaustion in long-running gateway #10200

Description

Summary

Root Cause

Reproduction

Observed Impact

Suggested Fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions