Lazy import of model_tools blocks asyncio event loop on first gateway message when an MCP server is slow/unreachable

## Summary

`model_tools.py` runs `discover_mcp_tools()` as a module-level side effect (line 143). The gateway lazy-imports `run_agent` (which imports `model_tools`) the first time a user message reaches `_handle_message_with_agent` — meaning the very first message after gateway start triggers MCP discovery **inside the asyncio event loop thread**. Since `_run_on_mcp_loop` uses a blocking `future.result(timeout=120)` rather than `await`, this freezes the Discord/Telegram/etc. WebSocket heartbeat for up to 120 seconds whenever any configured MCP server is unreachable. After ~50s Discord force-closes the shard.

This is distinct from #10138 (which is about a nested-call deadlock inside `register_mcp_servers`). Even if #10138 is fixed, a slow/unreachable MCP server will still freeze the loop because the discovery is invoked synchronously from an async context.

## Reproduction

1. Add an unreachable MCP server URL to `config.yaml`:
   ```yaml
   mcp_servers:
     unreachable:
       url: http://10.99.99.99:9999/mcp
   ```
2. Start the gateway. Discovery succeeds at startup (logs `MCP: registered N tool(s) from M server(s) (1 failed)` after a short retry window).
3. Send the **first** Discord/Telegram message after gateway start.
4. Within ~10s, the platform logs `Shard ID None heartbeat blocked for more than 10 seconds.` Heartbeat-block warnings escalate every 10s. The first message hangs for ~120s before either responding or the shard reconnects.

A subsequent message in the same gateway process is fine — `model_tools` is now imported and the side-effect doesn't re-run.

## Stack trace (Hermes 0.11.0 / v2026.4.23, Python 3.11.15)

```
2026-04-28 05:54:59 WARNING discord.gateway: Shard ID None heartbeat blocked for more than 40 seconds.
Loop thread traceback (most recent call last):
  ...
  File "gateway/platforms/base.py", line 2072, in _process_message_background
    response = await self._message_handler(event)
  File "gateway/run.py", line 3871, in _handle_message
    return await self._handle_message_with_agent(...)
  File "gateway/run.py", line 4516, in _handle_message_with_agent
    agent_result = await self._run_agent(...)
  File "gateway/run.py", line 9334, in _run_agent
    from run_agent import AIAgent              # lazy import
  File "run_agent.py", line 67, in <module>
    from model_tools import (...)              # transitive
  File "model_tools.py", line 143, in <module>
    discover_mcp_tools()                       # module-level side effect
  File "tools/mcp_tool.py", line 2455, in discover_mcp_tools
    tool_names = register_mcp_servers(servers)
  File "tools/mcp_tool.py", line 2408, in register_mcp_servers
    _run_on_mcp_loop(_discover_all(), timeout=120)
  File "tools/mcp_tool.py", line 1577, in _run_on_mcp_loop
    return future.result(timeout=wait_timeout)  # BLOCKS asyncio loop
  File ".../concurrent/futures/_base.py", line 451, in result
    self._condition.wait(timeout)
```

## Why it manifests now

In a clean dev session, MCP discovery has already happened at gateway startup, so the lazy import on first message is cheap. The bug surfaces when:

- An MCP server is configured but unreachable (network timeout, dead host, wrong port, etc.) — startup discovery records "(1 failed)" but doesn't blacklist it, and
- The lazy import path re-invokes `discover_mcp_tools` which retries the failed server with the full 120s budget.

I'd guess most users haven't hit this because their MCP servers are local/reachable.

## Suggested fixes

Either of these resolves the symptom; ideally both:

1. **Remove the module-level call.** `model_tools.py:143` calling `discover_mcp_tools()` at import is a side effect that's unsafe from any async context. Discovery already runs at gateway startup; a second invocation from within a message handler shouldn't be needed. If a re-discovery hook is wanted, expose it as an explicit function and call it from a non-async lifecycle event.

2. **Make `_run_on_mcp_loop` async-aware.** When called from an event loop, schedule the coroutine and `await` the future via `asyncio.wrap_future` rather than `future.result(timeout=...)`. Today's blocking-wait pattern silently freezes whatever loop happens to be running.

## Workaround

Remove the slow/unreachable server from `mcp_servers` in `config.yaml`. Discovery completes in ~2s and the import-time call returns fast enough not to trip the heartbeat watchdog. This is what we did locally.

## Environment

- Hermes Agent **v0.11.0** (v2026.4.23)
- Python 3.11.15 on Linux (Debian/LXC)
- Gateway: hermes-gateway systemd user service
- Platform: Discord (`discord.py`); the same blocking-wait pattern would affect any platform whose handler runs in the asyncio loop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy import of model_tools blocks asyncio event loop on first gateway message when an MCP server is slow/unreachable #16856

Summary

Reproduction

Stack trace (Hermes 0.11.0 / v2026.4.23, Python 3.11.15)

Why it manifests now

Suggested fixes

Workaround

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Lazy import of model_tools blocks asyncio event loop on first gateway message when an MCP server is slow/unreachable #16856

Description

Summary

Reproduction

Stack trace (Hermes 0.11.0 / v2026.4.23, Python 3.11.15)

Why it manifests now

Suggested fixes

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions