Skip to content

[Bug]: MCP server session expires during long-running gateway — no auto-reconnect #13383

@Alex-yang00

Description

@Alex-yang00

Bug Description

When the Hermes gateway runs for an extended period, MCP servers using the Streamable HTTP transport lose their server-side session. Subsequent tool calls fail with:

Invalid params: Invalid or expired session

The MCP client does not detect this condition and re-establish the session automatically. The only recovery is a full gateway restart, which interrupts all connected messaging platforms.

Note: This is NOT an OAuth token expiry issue — the access token remains valid (direct API calls return HTTP 200). The failure is at the MCP transport session layer.

Steps to Reproduce

  1. Configure a Streamable HTTP MCP server (e.g. WordPress.com MCP)
  2. Run hermes gateway run and leave it running for several days
  3. Invoke any tool on that MCP server
  4. Observe the error

Expected Behavior

When a tool call returns "Invalid or expired session", the MCP client should automatically re-establish the session using the still-valid credentials and retry the call transparently.

Actual Behavior

Every subsequent tool call on the affected MCP server fails. The server remains broken until the gateway is manually restarted.

Relevant log entries:

ERROR tools.mcp_tool: MCP tool wpcom-mcp/wpcom-mcp-content-authoring call failed: Invalid params: Invalid or expired session
WARNING tools.mcp_tool: Failed to connect to MCP server 'wpcom-mcp': Client error '401 Unauthorized'
WARNING tools.mcp_oauth: MCP OAuth for 'wpcom-mcp': non-interactive environment and no cached tokens found.

Affected Component

  • Tools (MCP client)
  • Agent Core (gateway long-running stability)

Environment

  • OS: Linux x86_64
  • Hermes Version: 0.10.0 (2026.4.16)
  • Python: 3.11.13
  • MCP transport: Streamable HTTP

Root Cause Analysis

The MCP client in tools/mcp_tool.py does not treat "Invalid or expired session" as a reconnect trigger. The 3-attempt retry logic runs only at gateway startup, not on mid-session failures. Session expiry during normal operation falls through as a plain tool error with no recovery path.

Proposed Fix

In the MCP tool call handler, catch "Invalid or expired session" errors, tear down and re-initialize the MCP client for that server, then retry the original call once. The OAuth token remains valid — only the transport-layer session needs to be re-established.

Willing to submit a PR?

Not at this time, but the fix scope is well-defined above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliverytool/mcpMCP client and OAuthtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions