Skip to content

fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways#10470

Merged
teknium1 merged 1 commit into
mainfrom
fix/client-cache-fd-exhaustion
Apr 15, 2026
Merged

fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways#10470
teknium1 merged 1 commit into
mainfrom
fix/client-cache-fd-exhaustion

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Fixes #10200_client_cache in auxiliary_client.py accumulated unbounded entries because event loop id() was part of the cache key. Every new worker-thread event loop created a new entry for the same provider config. In long-running gateways where threads recycle frequently, this exhausted file descriptors after days of operation.

Root Cause

The cache key included loop_id = id(current_loop). When gateway worker threads create new event loops (via _run_async()/asyncio.run()), each loop gets a unique id(). The cache held a reference to the old loop object, preventing GC and ensuring new loops always got different IDs. Old entries with dead loops piled up — each holding an unclosed AsyncOpenAI client with its httpx connection pool (KQUEUE fds, unix sockets, IPv4 fds).

Fix

  • Remove loop_id from cache key — the logical key is now (provider, async_mode, base_url, api_key, api_mode, runtime_key)
  • Validate loop at hit time — on async cache hits, check that the cached loop is the current, open loop. If the loop changed or was closed, force-close the stale client and replace the entry in-place
  • Add _CLIENT_CACHE_MAX_SIZE = 64 safety belt — FIFO eviction as defense-in-depth

This bounds cache growth to one entry per unique provider config rather than one per (config × event-loop). Cross-loop safety is preserved: different loops still get different client instances (validated by the existing TestCrossLoopCacheIsolation suite).

E2E Verification

Simulated 20 sequential worker threads with different event loops for the same provider:

  • Before: 20 cache entries (one per loop) → unbounded growth → fd exhaustion
  • After: 1 cache entry (replaced in-place) + 20 unique clients (cross-loop safe)

Test Results

  • 14 targeted tests pass (9 in test_async_httpx_del_neuter.py + 5 in test_crossloop_client_cache.py)
  • 3 new tests: TestClientCacheBoundedGrowth — stale loop replacement, no-growth verification, max-size eviction
  • 1885 passing in broader agent/run_agent suite (9 pre-existing failures unrelated to this change)

…nning gateways (#10200)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
@teknium1 teknium1 merged commit 6391b46 into main Apr 15, 2026
4 of 5 checks passed
@teknium1 teknium1 deleted the fix/client-cache-fd-exhaustion branch April 15, 2026 20:16
kagura-agent pushed a commit to kagura-agent/hermes-agent that referenced this pull request Apr 16, 2026
…nning gateways (NousResearch#10200) (NousResearch#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes NousResearch#10200
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…nning gateways (NousResearch#10200) (NousResearch#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes NousResearch#10200
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…nning gateways (NousResearch#10200) (NousResearch#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes NousResearch#10200
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…nning gateways (NousResearch#10200) (NousResearch#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes NousResearch#10200
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…nning gateways (NousResearch#10200) (NousResearch#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes NousResearch#10200
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…nning gateways (NousResearch#10200) (NousResearch#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes NousResearch#10200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: auxiliary_client._client_cache accumulates AsyncOpenAI clients indefinitely, causing fd exhaustion in long-running gateway

1 participant