Skip to content

Fix #29726 — MCP server startup hang fix#29853

Open
zombi3butt wants to merge 1 commit into
NousResearch:mainfrom
zombi3butt:fix/mcp-server-startup-hang
Open

Fix #29726 — MCP server startup hang fix#29853
zombi3butt wants to merge 1 commit into
NousResearch:mainfrom
zombi3butt:fix/mcp-server-startup-hang

Conversation

@zombi3butt

Copy link
Copy Markdown

Fix #29726 — One unhealthy optional MCP integration should not make the whole Hermes session unavailable

Problem

When an MCP server is configured under mcp_servers, Hermes attempts to connect and discover tools during startup. If that MCP server is down, slow, misconfigured, or fails during initialize / list_tools, Hermes startup could hang indefinitely before the normal chat session becomes usable.

Root Cause

The MCPServerTask.start() method called await self._ready.wait() with no timeout wrapper. If the server's internal run() loop never fires _ready.set() (e.g., due to a silently swallowed exception in a nested transport handler), start() blocks forever — and since MCP discovery runs serially per-server, one dead server freezes the entire startup.

Fix

  1. MCPServerTask.start() — wrapped await self._ready.wait() in asyncio.wait_for(timeout=connect_timeout). On timeout: cancels the orphaned run-task to prevent resource leaks, then raises TimeoutError.
  2. hermes_cli/main.py — changed MCP startup error log from logger.debuglogger.warning so users see a clear "MCP tool discovery failed" message instead of silently losing tools.

Behavior Change

  • Before: A hung/stuck MCP server could block Hermes startup indefinitely.
  • After: Per-server connection timeout (default 60s, configurable via connect_timeout) limits wait time. Failed servers log a visible warning. The rest of Hermes starts normally with available servers only.

Notes

  • No breaking API changes. All existing MCP configs work identically.
  • discover_mcp_tools() already used asyncio.gather(return_exceptions=True) and logged warnings for per-server failures — this just adds the missing timeout guard at the lowest level (start()).

@alt-glitch alt-glitch added type/bug Something isn't working tool/mcp MCP client and OAuth comp/cli CLI entry point, hermes_cli/, setup wizard P1 High — major feature broken, no workaround labels May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P1 High — major feature broken, no workaround tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP server initialization failure should not prevent Hermes startup

3 participants