fix(weixin): Replace cross-loop session reuse with thread-local persistent session by ikelvingo · Pull Request #18251 · NousResearch/hermes-agent

ikelvingo · 2026-05-01T05:24:24Z

What does this PR do?

Symptom

WeChat image/file delivery via the send_message tool fails consistently with:
RuntimeError: Timeout context manager should be used inside a task
The error occurs because aiohttp.TimerContext.current_task() returns None when the session was created on a different event loop than the one calling session.post().
Text-only messages sent through the gateway's native adapter path were unaffected (they run inside the gateway's own event loop).

Root Cause

v2026.4.23 introduced a "lazy recreate session" optimization in send_weixin_direct().
The optimization reused the gateway's long-lived aiohttp.ClientSession (from _LIVE_ADAPTERS) for one-shot sends triggered by the send_message tool.
These tool calls execute inside a ThreadPoolExecutor worker thread via _run_async().
The gateway's ClientSession was created on the gateway's event loop, but session.post(timeout=...) was called from the worker thread — a cross-event-loop access that aiohttp.TimerContext does not support.
TimerContext.__enter__() calls asyncio.current_task(loop=session._loop). Since no task is running on the gateway's loop inside the worker thread, it returns None, raising the RuntimeError.

Intent of the Original Change

Avoid re-creating TCP/SSL connections (handshake overhead) on every send_message tool invocation.
Reduce latency for repeated WeChat sends (text + image uploads make 3-4 API calls per message).

Risks of the Original Approach

Thread-safety violation: aiohttp.ClientSession is explicitly documented as not thread-safe.
SSL connection corruption: Concurrent writes to the same TCP connector socket can cause response cross-talk between worker threads.
Cookie/token state pollution: Session-level state can leak between concurrent sends for different users.
Authentication boundary break: Shared session flattens the isolation between different chat contexts.

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

Fix

Added _get_worker_session() to model_tools.py:
- Each worker thread maintains its own persistent aiohttp.ClientSession via threading.local().
- The session is bound to the thread's persistent event loop (_get_worker_loop()), so TimerContext.current_task() always finds a valid task.
- Sessions are lazily created on first use and reused across subsequent calls within the same thread.
- SSL connector setup mirrors _make_ssl_connector() from gateway/platforms/weixin.py.
Removed lazy recreate branch from send_weixin_direct():
- Deleted the _LIVE_ADAPTERS.get(resolved_token) shortcut (26 lines).
- Replaced async with aiohttp.ClientSession(trust_env=True, connector=...) with session = _get_worker_session().
- The adapter is still created fresh per call (adapter lifecycle unchanged).
- Function reduced from 98 to 72 lines.

Behavior After Fix

Worker threads send WeChat messages through their own session, bound to their own event loop.
TCP/SSL connections are still reused within each thread — no per-call handshake penalty.
No cross-event-loop access: TimerContext.current_task() always returns the running task on the worker's loop.
The gateway's own adapter sending path is unchanged — it still uses its original session on the gateway loop.

Conflict Resolution

During review, a parallel approach was identified: replacing ClientTimeout with asyncio.wait_for() (chenlinfeng, May 3). This masks the TimerContext error but leaves the underlying ClientSession thread-safety violations intact.

	Thread-local session (this PR)	asyncio.wait_for
Fixes `TimerContext` error	✅	✅
Fixes `ClientSession` thread-safety	✅	❌
Fixes SSL connection corruption	✅	❌
Fixes auth/state boundary break	✅	❌
Retains per-thread connection reuse	✅	❌

The thread-local session approach eliminates all root-cause risks simultaneously. asyncio.wait_for() only replaces the error type (TimeoutError vs RuntimeError) without fixing the architectural problem.

Files Changed

model_tools.py (+13 lines): Added _get_worker_session() with thread-local persistent session management.
gateway/platforms/weixin.py (-21 lines): Removed lazy recreate block; replaced async with ClientSession with persistent worker session getter.

How to Test

Verification

send_message tool with WeChat image attachments: ✅ success: true, valid message_id returned.
Gateway logs: No RuntimeError or send failed entries post-fix.
Thread isolation confirmed: Different worker threads get different sessions; same thread reuses the same session.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform:

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

This skill is broadly useful to most users (if bundled) — see Contributing Guide
SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

alt-glitch · 2026-05-01T05:28:21Z

Likely duplicate of #14873, #14530, #12810 — same cross-event-loop aiohttp ClientSession reuse in send_weixin_direct(). At least 5 open PRs fix this identical root cause.

alt-glitch · 2026-05-01T05:29:14Z

Likely duplicate of #14873, #14530, #12810.

ikelvingo · 2026-05-01T05:35:13Z

Because in version v2026.4.23, a PR was merged that is neither secure nor compliant with aiohttp's usage guidelines, and this was also not fixed in version v2026.4.30.

…stent session Port of NousResearch#18251: - model_tools.py: add _get_worker_session() with thread-local persistent aiohttp.ClientSession bound to each worker thread's event loop - gateway/platforms/weixin.py: remove _LIVE_ADAPTERS lazy-reuse block in send_weixin_direct(), use _get_worker_session() instead fix(honcho): sync_turn handles multimodal (list) content from image-bearing messages Port of NousResearch#22768: - plugins/memory/honcho/__init__.py: expand type annotation to str | list for user_content and assistant_content; add _extract_text() helper to extract text parts from multimodal lists before sanitize_context() - image references are excluded from memory (memory providers should not store image URIs)

Port of upstream f24b7ed (fix: make Honcho startup fail open): - plugins/memory/honcho/__init__.py: refactor startup to background thread via _start_session_init_background(), add _session_ready() guard, and new _init_thread/_init_lock/_init_error fields for fail-open behavior - tests/test_honcho_startup_fail_open.py: add 357-line test suite Port of NousResearch#22768 (fix(honcho): sync_urn crashes on list input from image-bearing messages): - plugins/memory/honcho/__init__.py: expand sync_urn type annotation to str | list for user_content and assistant_content; add _extract_text() helper to extract text parts from multimodal lists; image references are excluded from memory (memory providers should not store image URIs)

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets platform/wecom WeCom / WeChat Work adapter labels May 1, 2026

ikelvingo force-pushed the main branch 2 times, most recently from f998cad to 6fca9a3 Compare May 9, 2026 19:57

ikelvingo force-pushed the main branch from 6fca9a3 to efd0317 Compare May 31, 2026 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(weixin): Replace cross-loop session reuse with thread-local persistent session#18251

fix(weixin): Replace cross-loop session reuse with thread-local persistent session#18251
ikelvingo wants to merge 2 commits into
NousResearch:mainfrom
ikelvingo:main

ikelvingo commented May 1, 2026 •

edited

Loading

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

ikelvingo commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ikelvingo commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Symptom

Root Cause

Intent of the Original Change

Risks of the Original Approach

Type of Change

Changes Made

Fix

Behavior After Fix

Conflict Resolution

Files Changed

How to Test

Verification

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

ikelvingo commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ikelvingo commented May 1, 2026 •

edited

Loading