Skip to content

fix(weixin): rebuild aiohttp session for cross-loop send_* calls#17267

Closed
hackiey wants to merge 1 commit into
NousResearch:mainfrom
hackiey:fix/weixin-cross-loop-session
Closed

fix(weixin): rebuild aiohttp session for cross-loop send_* calls#17267
hackiey wants to merge 1 commit into
NousResearch:mainfrom
hackiey:fix/weixin-cross-loop-session

Conversation

@hackiey

@hackiey hackiey commented Apr 29, 2026

Copy link
Copy Markdown

What does this PR do?

Fix the long-standing RuntimeError: Timeout context manager should be used inside a task raised by the Weixin adapter whenever the agent uses the send_message tool to send media (files, images, voice, video).

Root cause (verified with runtime debug logging on macOS, Python 3.11.15, aiohttp 3.13.5):

When the agent invokes the send_message tool, model_tools._run_async() detects an active event loop in the gateway's main thread and dispatches the call to a worker thread that calls asyncio.run(). This creates a fresh event loop in the worker thread. Inside that loop, _send_to_platform → WeixinAdapter.send_document → _send_file → _api_post reuses self._send_session, which is an aiohttp.ClientSession bound to the gateway's main loop. aiohttp's TimerContext.__enter__ then calls asyncio.current_task() against the session's original loop, gets None, and raises.

The bug only fires for media sends, because the gateway delivers normal text replies inline on its own loop — that path never crosses loops.

Debug capture from a failing call:

[WEIXIN-DBG] task=<Task-33 coro=<_send_to_platform() at send_message_tool.py:507>
             cb=[_run_until_complete_cb()]>
             loop=<_UnixSelectorEventLoop ...>
             session_loop=<_UnixSelectorEventLoop ...>
             thread='asyncio_1'                       ← worker thread, fresh loop
ERROR: Timeout context manager should be used inside a task

loop and session_loop reprs look identical but are different instances (one in MainThread, one in asyncio_1).

Fix: add WeixinAdapter._loop_safe_session(), an async context manager that swaps in a per-call aiohttp.ClientSession bound to the running loop when it detects the cached session belongs to a different loop. Each send_* method wraps its body with async with self._loop_safe_session():. When the call is on the original loop the helper is a no-op (no new session, no extra connection).

Related Issue

Refs #13099, #12154, #13281, #16293

(Partial fix — addresses the immediate Weixin symptom in all four issues. The deeper model_tools._run_async cross-loop architecture concern raised in #13281 is left for a separate change.)

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/platforms/weixin.py:
    • Add import contextlib
    • Add WeixinAdapter._loop_safe_session() async context manager
    • Wrap method bodies with async with self._loop_safe_session(): in send, send_image, send_image_file, send_document, send_video, send_voice
    • send_typing is left untouched (gateway-internal, never crosses loops)
  • tests/gateway/test_weixin.py: add TestWeixinLoopSafeSession regression class with three cases (foreign-loop session swap, same-loop pass-through, no-session no-op)

How to Test

  1. pytest tests/gateway/test_weixin.py -q — 45 passed (42 existing + 3 new)
  2. End-to-end repro on macOS:
    • Configure Weixin gateway, start hermes gateway, chat with the bot from WeChat
    • Ask the agent to send a file (e.g. "请把 README 发给我")
    • Before: [Weixin] send_document failed to=...: Timeout context manager should be used inside a task
    • After: file is delivered successfully

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(weixin): ...)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run pytest tests/gateway/test_weixin.py -q and all tests pass
  • I've added tests for my changes
  • I've tested on my platform: macOS 15.x, Python 3.11.15

Documentation & Housekeeping

  • I've updated relevant documentation — N/A (internal behavior fix, no user-facing surface change)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact — fix is OS-independent (the bug shows up on macOS but the cross-loop issue is platform-agnostic; Linux just happens to schedule timing-sensitive paths differently)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

Debug log capturing the cross-loop root cause

```
[WEIXIN-DBG] task=<Task pending name='weixin-poll' coro=<WeixinAdapter._poll_loop()>>
loop=<_UnixSelectorEventLoop running=True closed=False debug=False>
session_loop=<_UnixSelectorEventLoop running=True closed=False debug=False>
thread='MainThread'
[WEIXIN-DBG] task=<Task pending name='Task-25' coro=<WeixinAdapter.send_typing()>>
... thread='MainThread' ← gateway-owned, OK
[WEIXIN-DBG] task=<Task pending name='Task-33'
coro=<_send_to_platform() running at .../send_message_tool.py:507>
cb=[_run_until_complete_cb()]>
loop=<_UnixSelectorEventLoop running=True closed=False debug=False>
session_loop=<_UnixSelectorEventLoop running=True closed=False debug=False>
thread='asyncio_1' ← worker thread, fresh loop
ERROR gateway.platforms.weixin: [Weixin] send_document failed to=o9cq80xK:
Timeout context manager should be used inside a task
Traceback (most recent call last):
File ".../weixin.py", line 1715, in send_document
message_id = await self._send_file(chat_id, file_path, caption or "")
File ".../weixin.py", line 1795, in _send_file
upload_response = await _get_upload_url(...)
File ".../weixin.py", line 508, in _get_upload_url
return await _api_post(...)
File ".../weixin.py", line 366, in _api_post
async with session.post(url, ..., timeout=timeout) as response:
File ".../aiohttp/client.py", line 632, in _request
with timer:
File ".../aiohttp/helpers.py", line 678, in enter
raise RuntimeError("Timeout context manager should be used inside a task")
```

After the fix, the same call path on thread='asyncio_1' swaps in a fresh aiohttp.ClientSession bound to that thread's loop, and the upload completes normally.

Reusing the gateway-owned aiohttp.ClientSession across event loops
triggers aiohttp's TimerContext to raise "Timeout context manager
should be used inside a task" because asyncio.current_task() returns
None when checked against the session's original loop.

This fires whenever the agent invokes the send_message tool with media:
model_tools._run_async() spawns a worker thread and calls asyncio.run(),
which creates a fresh event loop. send_document/send_image_file/etc.
then try to use self._send_session, which was bound to the gateway's
main loop, and aiohttp blows up before any bytes go over the wire.

Regular text replies are unaffected because the gateway delivers them
inline on its own loop.

Add WeixinAdapter._loop_safe_session(), an async context manager that
detects cross-loop usage and swaps in a per-call ClientSession bound
to the running loop. Wrap the body of every send_* method with it
(send, send_image, send_image_file, send_document, send_video,
send_voice). send_typing is gateway-internal and never crosses loops,
so it is left untouched.

Refs: NousResearch#13099, NousResearch#12154, NousResearch#13281, NousResearch#16293
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/wecom WeCom / WeChat Work adapter labels Apr 29, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #14873 — same root cause: cross-event-loop aiohttp ClientSession reuse in Weixin adapter. Multiple competing PRs (#15911, #14384, #14481) address the same issue.

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the detailed repro and runtime notes. This is now covered on current main, so I’m closing this as implemented by an automated hermes-sweeper review.

Evidence:

  • tools/send_message_tool.py:1549 routes Weixin send_message delivery through send_weixin_direct(..., media_files=media_files).
  • gateway/platforms/weixin.py:2286 only reuses the live adapter’s _send_session when send_session._loop is asyncio.get_running_loop(), preventing the cross-loop session reuse described here.
  • gateway/platforms/weixin.py:2315 falls back to a fresh aiohttp.ClientSession bound to the current loop, then sends media through the temporary adapter.
  • gateway/platforms/weixin.py:381 and tests/gateway/test_weixin.py:1106 show the related timeout migration: _api_post uses asyncio.wait_for() and regression coverage asserts no aiohttp timeout= kwarg is passed.

The maintainer comment noting overlap with the other Weixin cross-loop PRs was useful context; current main has taken the guarded/fresh-session route for this path.

@teknium1 teknium1 closed this Jun 10, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists platform/wecom WeCom / WeChat Work adapter sweeper:implemented-on-main Sweeper: behavior already present on current main type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants