Bug Description
When using an HTTP/StreamableHTTP MCP server, the streamable_http_client context manager throws RuntimeError("The current task is not holding this lock") during cleanup when the session is torn down for reconnect. This is treated as a connection failure, consumes retry attempts, and eventually causes the MCP server to be permanently disconnected after 5 retries.
Steps to Reproduce
- Configure an HTTP MCP server in
config.yaml
- Start Hermes Gateway
- Wait for keepalive to trigger reconnection (default 180s)
- Connection is lost after ~3 minutes, retries fail, server gives up
Root Cause
In tools/mcp_tool.py, _run_http() uses async with streamable_http_client(...). When the MCP session tears down for reconnect, streamable_http_client.__aexit__ triggers anyio TaskGroup cleanup. With anyio >= 4.x on asyncio, the anyio Lock acquired inside the TaskGroup cannot be released from the asyncio task driving cleanup — it was held by a different coroutine:
RuntimeError: The current task is not holding this lock
anyio/_core/_synchronization.py:166
anyio/_backends/_asyncio.py:1854
This propagates to run() except Exception, counting as a reconnect failure. After 5 retries (all hitting the same bug), the server is permanently given up.
Environment
- Hermes Agent: v0.14.0 (c016949)
- Python: 3.11.14 / mcp SDK: 1.24.0 / anyio: 4.12.0
- MCP server: gbrain v0.37.1.0 (HTTP/StreamableHTTP via ngrok)
Log Evidence
keepalive failed, triggering reconnect
reconnect requested — tearing down HTTP session
connection lost (attempt 1/5): unhandled errors in a TaskGroup
...
failed after 5 reconnection attempts, giving up
Workaround
Skip retry counting for this known anyio cleanup error when the server was already connected:
if isinstance(exc, RuntimeError) and "not holding this lock" in str(exc):
if self._ready.is_set():
continue # expected anyio cleanup, retry immediately
Suggested Fix
Handle streamable_http_client cleanup at the _run_http() boundary when exiting for reconnect, or adjust async context management to ensure Lock ownership consistency.
Bug Description
When using an HTTP/StreamableHTTP MCP server, the
streamable_http_clientcontext manager throwsRuntimeError("The current task is not holding this lock")during cleanup when the session is torn down for reconnect. This is treated as a connection failure, consumes retry attempts, and eventually causes the MCP server to be permanently disconnected after 5 retries.Steps to Reproduce
config.yamlRoot Cause
In
tools/mcp_tool.py,_run_http()usesasync with streamable_http_client(...). When the MCP session tears down for reconnect,streamable_http_client.__aexit__triggers anyio TaskGroup cleanup. With anyio >= 4.x on asyncio, the anyio Lock acquired inside the TaskGroup cannot be released from the asyncio task driving cleanup — it was held by a different coroutine:This propagates to
run()except Exception, counting as a reconnect failure. After 5 retries (all hitting the same bug), the server is permanently given up.Environment
Log Evidence
Workaround
Skip retry counting for this known anyio cleanup error when the server was already connected:
Suggested Fix
Handle
streamable_http_clientcleanup at the_run_http()boundary when exiting for reconnect, or adjust async context management to ensure Lock ownership consistency.