Summary
During normal CLI usage, stale AsyncOpenAI / AsyncAnthropic client objects that get garbage-collected mid-session trigger an unhandled RuntimeError: Event loop is closed exception. This activates prompt_toolkit's exception handler, which prints the traceback and halts the session with "Press ENTER to continue..." — forcing the user to intervene manually.
This can happen repeatedly in a single session if multiple stale clients accumulate (e.g., during heavy tool use), making the CLI unusable until the user presses ENTER each time.
Root Cause
The crash chain (7 components):
-
Tool execution creates async clients on the tool loop. _run_async() in model_tools.py runs coroutines on a persistent _tool_loop via run_until_complete(). Some tools (e.g., mixture_of_agents via openrouter_client.py, or auxiliary_client.py for summarization/compression) create AsyncOpenAI clients during execution. These clients internally create httpx.AsyncClient instances bound to _tool_loop.
-
Clients escape the cache. openrouter_client.py stores a global _client outside auxiliary_client._client_cache. Any AsyncOpenAI client created by resolve_provider_client() called directly (not through _get_cached_client()) is also untracked. These clients are invisible to shutdown_cached_clients().
-
Stale clients get garbage-collected. When a client reference is dropped (module reload, replacement, scope exit), Python's GC eventually collects it. The AsyncHttpxClientWrapper.__del__ method (in both openai/_base_client.py:1429 and anthropic/_base_client.py:1537) fires.
-
__del__ schedules aclose() on the wrong loop. The __del__ method calls:
asyncio.get_running_loop().create_task(self.aclose())
If GC runs while prompt_toolkit's event loop is active (which it almost always is during a CLI session), get_running_loop() returns prompt_toolkit's loop, not the _tool_loop the client was created on.
-
aclose() cascades into dead transport. The aclose() task runs on prompt_toolkit's loop and walks the chain: httpx.AsyncClient.aclose() → httpcore.AsyncConnectionPool.aclose() → AsyncHTTP11Connection.aclose() → anyio.TLSStream.aclose() → asyncio._SelectorSocketTransport.close(). At this final step, the transport calls self._loop.call_soon(self._call_connection_lost, None) — but self._loop is the original tool loop, which may be in a different state.
-
call_soon hits the closed/dead loop. base_events.py:795 calls self._check_closed() which raises RuntimeError('Event loop is closed').
-
prompt_toolkit catches and halts. Since prompt_toolkit installed _handle_exception as the event loop's exception handler (line 830 of application.py), asyncio calls it with the unhandled exception. This prints the traceback and awaits _do_wait_for_enter("Press ENTER to continue...") — blocking the entire CLI until the user presses ENTER.
Full traceback
Unhandled exception in event loop:
File "...site-packages/httpx/_client.py", line 1985, in aclose
await self._transport.aclose()
File "...site-packages/httpx/_transports/default.py", line 406, in aclose
await self._pool.aclose()
File "...site-packages/httpcore/_async/connection_pool.py", line 353, in aclose
await self._close_connections(closing_connections)
File "...site-packages/httpcore/_async/connection_pool.py", line 345, in _close_connections
await connection.aclose()
File "...site-packages/httpcore/_async/connection.py", line 173, in aclose
await self._connection.aclose()
File "...site-packages/httpcore/_async/http11.py", line 258, in aclose
await self._network_stream.aclose()
File "...site-packages/httpcore/_backends/anyio.py", line 53, in aclose
await self._stream.aclose()
File "...site-packages/anyio/streams/tls.py", line 241, in aclose
await self.transport_stream.aclose()
File "...site-packages/anyio/_backends/_asyncio.py", line 1329, in aclose
self._transport.close()
File ".../asyncio/selector_events.py", line 1211, in close
super().close()
File ".../asyncio/selector_events.py", line 875, in close
self._loop.call_soon(self._call_connection_lost, None)
File ".../asyncio/base_events.py", line 795, in call_soon
self._check_closed()
File ".../asyncio/base_events.py", line 541, in _check_closed
raise RuntimeError('Event loop is closed')
Exception Event loop is closed
Press ENTER to continue...
Versions
- hermes-agent 0.4.0
- httpx 0.28.1
- httpcore 1.0.9
- openai (has AsyncHttpxClientWrapper.del)
- anthropic (has AsyncHttpxClientWrapper.del)
- prompt_toolkit 3.0.52
- Python 3.12
Existing Mitigations (and why they're insufficient)
The codebase already has two defenses:
-
_force_close_async_httpx() + shutdown_cached_clients() in auxiliary_client.py — marks cached AsyncOpenAI clients as CLOSED before __del__ can fire. But this only covers clients in _client_cache and only runs at shutdown. Clients outside the cache (e.g., openrouter_client._client) and mid-session GC events are unprotected.
-
Persistent _tool_loop in model_tools.py — keeps the event loop alive so cached clients don't reference a dead loop. But the __del__ scheduling path runs on prompt_toolkit's loop, not the tool loop, so the transport's internal self._loop reference still points to a potentially stale loop.
Proposed Fix (Option B — root cause)
1. Register ALL async clients for cleanup
Create a central registry in auxiliary_client.py that tracks every AsyncOpenAI/AsyncAnthropic client created anywhere, not just cached ones:
# auxiliary_client.py — add near _client_cache
_all_async_clients: list = [] # WeakRef list of all async clients
_all_async_clients_lock = threading.Lock()
def _track_async_client(client: Any) -> None:
"""Register an async client for cleanup on shutdown."""
import weakref
with _all_async_clients_lock:
_all_async_clients.append(weakref.ref(client))
def _force_close_all_async_clients() -> None:
"""Mark ALL tracked async clients as closed to prevent __del__ crashes."""
with _all_async_clients_lock:
for ref in _all_async_clients:
client = ref()
if client is not None:
_force_close_async_httpx(client)
_all_async_clients.clear()
Update shutdown_cached_clients() to also call _force_close_all_async_clients().
2. Track clients at creation points
In auxiliary_client.py resolve_provider_client(), after creating an async client:
if async_mode and client is not None:
_track_async_client(client)
In openrouter_client.py:
def get_async_client():
global _client
if _client is None:
from agent.auxiliary_client import resolve_provider_client, _track_async_client
client, _model = resolve_provider_client("openrouter", async_mode=True)
if client is None:
raise ValueError("OPENROUTER_API_KEY environment variable not set")
_track_async_client(client)
_client = client
return _client
3. Install a custom exception handler on prompt_toolkit's loop (defense-in-depth)
Even with perfect client tracking, third-party code could create untracked async clients. Install a wrapper around prompt_toolkit's exception handler that suppresses RuntimeError: Event loop is closed during aclose():
# cli.py — during startup, after prompt_toolkit app is created
def _make_safe_exception_handler(original_handler):
"""Wrap prompt_toolkit's exception handler to suppress aclose() Event loop crashes."""
def safe_handler(loop, context):
exception = context.get("exception")
if isinstance(exception, RuntimeError) and "Event loop is closed" in str(exception):
# Suppress — this is a harmless GC cleanup failure from a stale
# httpx/openai/anthropic async client. The connections will be
# dropped by the OS. Logging at debug level for diagnostics.
import logging
logging.debug(
"Suppressed 'Event loop is closed' from async client GC cleanup: %s",
context.get("message", ""),
)
return
# All other exceptions: delegate to prompt_toolkit's handler
if original_handler is not None:
original_handler(loop, context)
else:
loop.default_exception_handler(context)
return safe_handler
Install it right after the prompt_toolkit Application is created:
loop = asyncio.get_event_loop()
original = loop.get_exception_handler()
loop.set_exception_handler(_make_safe_exception_handler(original))
4. Neutering at loop shutdown time
Update _run_cleanup() to neutering async clients before the event loop closes:
def _run_cleanup():
global _cleanup_done
if _cleanup_done:
return
_cleanup_done = True
# ... existing cleanup ...
# Close ALL async clients (cached + tracked) to prevent __del__ crashes
try:
from agent.auxiliary_client import shutdown_cached_clients, _force_close_all_async_clients
shutdown_cached_clients()
_force_close_all_async_clients()
except Exception:
pass
Impact
- User-facing: CLI halts mid-session requiring manual ENTER press. Can happen repeatedly. Particularly triggered by heavy tool use (image generation, vision analysis, mixture-of-agents) that creates async HTTP connections.
- Triggered by: Any operation that creates and later discards an
AsyncOpenAI/AsyncAnthropic client while prompt_toolkit's event loop is running. Also triggered by httpx.AsyncClient instances whose underlying transports reference a closed/different event loop.
- Frequency: Intermittent — depends on GC timing. More likely during sessions with many tool calls.
Reproduction
- Start CLI session
- Use tools heavily that trigger async client creation (image_generate, mixture_of_agents, vision_analyze in rapid succession)
- Wait — GC will eventually collect a stale client
- Observe "Unhandled exception in event loop" + "Press ENTER to continue..."
The fastest reproduction path is anything that causes a long-running HTTP connection to time out or be abandoned while the tool loop is busy — e.g., a curl upload to an unresponsive host via terminal_tool while httpx.AsyncClient instances exist from prior tool calls.
Summary
During normal CLI usage, stale
AsyncOpenAI/AsyncAnthropicclient objects that get garbage-collected mid-session trigger an unhandledRuntimeError: Event loop is closedexception. This activates prompt_toolkit's exception handler, which prints the traceback and halts the session with "Press ENTER to continue..." — forcing the user to intervene manually.This can happen repeatedly in a single session if multiple stale clients accumulate (e.g., during heavy tool use), making the CLI unusable until the user presses ENTER each time.
Root Cause
The crash chain (7 components):
Tool execution creates async clients on the tool loop.
_run_async()inmodel_tools.pyruns coroutines on a persistent_tool_loopviarun_until_complete(). Some tools (e.g.,mixture_of_agentsviaopenrouter_client.py, orauxiliary_client.pyfor summarization/compression) createAsyncOpenAIclients during execution. These clients internally createhttpx.AsyncClientinstances bound to_tool_loop.Clients escape the cache.
openrouter_client.pystores a global_clientoutsideauxiliary_client._client_cache. AnyAsyncOpenAIclient created byresolve_provider_client()called directly (not through_get_cached_client()) is also untracked. These clients are invisible toshutdown_cached_clients().Stale clients get garbage-collected. When a client reference is dropped (module reload, replacement, scope exit), Python's GC eventually collects it. The
AsyncHttpxClientWrapper.__del__method (in bothopenai/_base_client.py:1429andanthropic/_base_client.py:1537) fires.__del__schedules aclose() on the wrong loop. The__del__method calls:If GC runs while prompt_toolkit's event loop is active (which it almost always is during a CLI session),
get_running_loop()returns prompt_toolkit's loop, not the_tool_loopthe client was created on.aclose()cascades into dead transport. Theaclose()task runs on prompt_toolkit's loop and walks the chain:httpx.AsyncClient.aclose()→httpcore.AsyncConnectionPool.aclose()→AsyncHTTP11Connection.aclose()→anyio.TLSStream.aclose()→asyncio._SelectorSocketTransport.close(). At this final step, the transport callsself._loop.call_soon(self._call_connection_lost, None)— butself._loopis the original tool loop, which may be in a different state.call_soonhits the closed/dead loop.base_events.py:795callsself._check_closed()which raisesRuntimeError('Event loop is closed').prompt_toolkit catches and halts. Since prompt_toolkit installed
_handle_exceptionas the event loop's exception handler (line 830 ofapplication.py), asyncio calls it with the unhandled exception. This prints the traceback and awaits_do_wait_for_enter("Press ENTER to continue...")— blocking the entire CLI until the user presses ENTER.Full traceback
Versions
Existing Mitigations (and why they're insufficient)
The codebase already has two defenses:
_force_close_async_httpx()+shutdown_cached_clients()inauxiliary_client.py— marks cachedAsyncOpenAIclients asCLOSEDbefore__del__can fire. But this only covers clients in_client_cacheand only runs at shutdown. Clients outside the cache (e.g.,openrouter_client._client) and mid-session GC events are unprotected.Persistent
_tool_loopinmodel_tools.py— keeps the event loop alive so cached clients don't reference a dead loop. But the__del__scheduling path runs on prompt_toolkit's loop, not the tool loop, so the transport's internalself._loopreference still points to a potentially stale loop.Proposed Fix (Option B — root cause)
1. Register ALL async clients for cleanup
Create a central registry in
auxiliary_client.pythat tracks everyAsyncOpenAI/AsyncAnthropicclient created anywhere, not just cached ones:Update
shutdown_cached_clients()to also call_force_close_all_async_clients().2. Track clients at creation points
In
auxiliary_client.pyresolve_provider_client(), after creating an async client:In
openrouter_client.py:3. Install a custom exception handler on prompt_toolkit's loop (defense-in-depth)
Even with perfect client tracking, third-party code could create untracked async clients. Install a wrapper around prompt_toolkit's exception handler that suppresses
RuntimeError: Event loop is closedduringaclose():Install it right after the prompt_toolkit Application is created:
4. Neutering at loop shutdown time
Update
_run_cleanup()to neutering async clients before the event loop closes:Impact
AsyncOpenAI/AsyncAnthropicclient while prompt_toolkit's event loop is running. Also triggered byhttpx.AsyncClientinstances whose underlying transports reference a closed/different event loop.Reproduction
The fastest reproduction path is anything that causes a long-running HTTP connection to time out or be abandoned while the tool loop is busy — e.g., a
curlupload to an unresponsive host viaterminal_toolwhilehttpx.AsyncClientinstances exist from prior tool calls.