Skip to content

fix(mcp): clear stale thread interrupt before MCP discovery#21276

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-b625eb32
May 7, 2026
Merged

fix(mcp): clear stale thread interrupt before MCP discovery#21276
teknium1 merged 2 commits into
mainfrom
hermes/hermes-b625eb32

Conversation

@teknium1

@teknium1 teknium1 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Salvage of #10287 by @AJV20 onto current main. Cherry-picked clean.

Summary

Clears the calling thread's stale interrupt flag before running MCP server discovery so discovery isn't immediately cancelled by interrupt state left over from a prior agent session.

Root cause

tools.interrupt tracks interrupts per thread ident in _interrupted_threads. asyncio's executor thread pool reuses threads across sessions. When a prior agent session set its thread's interrupt flag (via Ctrl+C, gateway timeout, etc.) and the thread got pooled and later reused to run register_mcp_servers, the FIRST poll iteration inside _run_on_mcp_loop sees is_interrupted() == True, cancels the discovery future, and raises InterruptedError. MCP discovery never runs, so MCP tools never get registered, so they silently don't appear in subsequent sessions.

Fix

Around _run_on_mcp_loop(_discover_all(), timeout=120) in register_mcp_servers: snapshot the current thread's interrupt state, temporarily clear it for the duration of discovery, restore on exit via finally. The user's actual interrupt semantics (if any) are preserved.

Changes

  • tools/mcp_tool.py: +13 / -1 around the discovery call site
  • scripts/release.py: AUTHOR_MAP entry for @AJV20

Note on #9930 reference

The original PR header says "Fixes #9930", but #9930 is a DIFFERENT bug (Python 3.11+ CancelledError escaping except Exception in MCPServerTask.run(), also still live on main). Both bugs are real and independent — this PR only fixes the stale-interrupt path. Leaving #9930 open for a follow-up.

Validation

Result
scripts/run_tests.sh tests/tools/test_mcp_tool*.py 199/199 passed
E2E positive: simulate stale interrupt on current thread, run register_mcp_servers with the patch discovery runs with clear flag, original interrupt state restored on exit
E2E negative control: simulate stale interrupt, call _run_on_mcp_loop directly with no guard raises InterruptedError: User sent a new message — confirms bug is real

Closes #10287 (not #9930). @AJV20's authorship preserved via rebase-merge.

AJV20 and others added 2 commits May 7, 2026 06:23
Fixes #9930

When an agent session is interrupted (Ctrl+C or gateway timeout), the
current thread's interrupt flag is set in _interrupted_threads. asyncio
executor threads are pooled and reused across sessions, so a thread that
carried an interrupt flag from a prior session will immediately cancel
any new asyncio work dispatched to it — including MCP server discovery.

Fix: in register_mcp_servers(), temporarily clear the interrupt flag on
the current thread before running _discover_all(), then restore it
afterward in a finally block so the original interrupt state is not lost.
@teknium1 teknium1 merged commit 46d1fc1 into main May 7, 2026
9 of 11 checks passed
@teknium1 teknium1 deleted the hermes/hermes-b625eb32 branch May 7, 2026 13:25
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-b625eb32 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7531 on HEAD, 7531 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 3953 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth labels May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP reconnect fails with asyncio.CancelledError on Python 3.11+

3 participants