Skip to content

MCP HTTP transport: anyio RuntimeError during streamable_http_client cleanup causes reconnect failure loop #31987

@madlouse

Description

@madlouse

Bug Description

When using an HTTP/StreamableHTTP MCP server, the streamable_http_client context manager throws RuntimeError("The current task is not holding this lock") during cleanup when the session is torn down for reconnect. This is treated as a connection failure, consumes retry attempts, and eventually causes the MCP server to be permanently disconnected after 5 retries.

Steps to Reproduce

  1. Configure an HTTP MCP server in config.yaml
  2. Start Hermes Gateway
  3. Wait for keepalive to trigger reconnection (default 180s)
  4. Connection is lost after ~3 minutes, retries fail, server gives up

Root Cause

In tools/mcp_tool.py, _run_http() uses async with streamable_http_client(...). When the MCP session tears down for reconnect, streamable_http_client.__aexit__ triggers anyio TaskGroup cleanup. With anyio >= 4.x on asyncio, the anyio Lock acquired inside the TaskGroup cannot be released from the asyncio task driving cleanup — it was held by a different coroutine:

RuntimeError: The current task is not holding this lock
  anyio/_core/_synchronization.py:166
  anyio/_backends/_asyncio.py:1854

This propagates to run() except Exception, counting as a reconnect failure. After 5 retries (all hitting the same bug), the server is permanently given up.

Environment

  • Hermes Agent: v0.14.0 (c016949)
  • Python: 3.11.14 / mcp SDK: 1.24.0 / anyio: 4.12.0
  • MCP server: gbrain v0.37.1.0 (HTTP/StreamableHTTP via ngrok)

Log Evidence

keepalive failed, triggering reconnect
reconnect requested — tearing down HTTP session
connection lost (attempt 1/5): unhandled errors in a TaskGroup
...
failed after 5 reconnection attempts, giving up

Workaround

Skip retry counting for this known anyio cleanup error when the server was already connected:

if isinstance(exc, RuntimeError) and "not holding this lock" in str(exc):
    if self._ready.is_set():
        continue  # expected anyio cleanup, retry immediately

Suggested Fix

Handle streamable_http_client cleanup at the _run_http() boundary when exiting for reconnect, or adjust async context management to ensure Lock ownership consistency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/toolsTool registry, model_tools, toolsetstool/mcpMCP client and OAuthtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions