Skip to content

fix(hindsight): preserve shared event loop across provider shutdowns#14109

Closed
perlowja wants to merge 1 commit into
NousResearch:mainfrom
perlowja:fix/hindsight-session-leaks-multi-provider
Closed

fix(hindsight): preserve shared event loop across provider shutdowns#14109
perlowja wants to merge 1 commit into
NousResearch:mainfrom
perlowja:fix/hindsight-session-leaks-multi-provider

Conversation

@perlowja

@perlowja perlowja commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes an aiohttp session leak in the Hindsight memory plugin that shows up as Unclosed client session / Unclosed connector errors in any long-running gateway hosting more than one concurrent chat.

Root cause. The module-global _loop / _loop_thread pair in plugins/memory/hindsight/__init__.py is shared across every HindsightMemoryProvider instance in the process. The plugin loader creates one provider per AIAgent (run_agent.py:1462), and the gateway creates one AIAgent per concurrent chat session (Telegram/Discord/Slack/WhatsApp/CLI). But HindsightMemoryProvider.shutdown() stopped the shared loop when any single session ended:

# Before (main)
if _loop is not None and _loop.is_running():
    _loop.call_soon_threadsafe(_loop.stop)
    _loop_thread.join(timeout=5.0)
    _loop = None
    _loop_thread = None

This stranded the aiohttp.ClientSession + TCPConnector owned by every sibling provider — their sessions were created on the loop (via hindsight_client_api.rest.RESTClientObject._ensure_session, which is one-session-per-client by design), and they cannot be closed once their loop is dead. They surface minutes later as the warnings reported in #11923.

Fix. Don't stop the shared loop on per-provider shutdown. Per-provider cleanup still closes that provider's own client via self._client.aclose() before self._client = None. The loop runs on a daemon thread and is reclaimed on process exit; keeping it alive between provider shutdowns lets sibling providers drain their own sessions cleanly.

Related Issue

Fixes #11923. Builds on (does not undo) the prior work in #4762 / #5094.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • plugins/memory/hindsight/__init__.pyHindsightMemoryProvider.shutdown() no longer stops the module-global event loop. Drops the now-unused global _loop, _loop_thread declaration. Replaces the loop-stop block with a comment explaining the sharing invariant.
  • tests/plugins/memory/test_hindsight_provider.py — adds TestSharedEventLoopLifecycle with two regression tests:
    • test_shutdown_does_not_stop_shared_event_loop — primes the shared loop, creates two independent providers (two concurrent chat sessions), shuts down one, verifies the loop + thread are still the same live objects and that the second provider can still dispatch async work. Fails on main with the exact #11923 failure mode; passes with the fix.
    • test_client_aclose_called_on_cloud_mode_shutdown — confirms per-provider session cleanup still runs (mock_client.aclose.assert_called_once()).

How to Test

  1. pytest tests/plugins/memory/test_hindsight_provider.py -q -o addopts= — 46 passed (44 existing + 2 new regression tests).
  2. pytest tests/plugins/ -q -o addopts= — 221 passed.
  3. pytest tests/ -q -o addopts= --ignore=tests/integration --ignore=tests/e2e --ignore=tests/acp --ignore=tests/tools/test_tts_kittentts.py — 14,109 passed (71 failed + 45 errored, all confined to tests/hermes_cli/test_web_server.py — an unrelated test file that also fails on pristine main; they do not touch plugins/memory/) (the excluded paths require optional-dep modules — acp, kittentts — not installed via pip install -e ".[dev]"; they also fail to collect on pristine main).
  4. To see the regression test catch the bug on unfixed code:
    git revert HEAD -- plugins/memory/hindsight/__init__.py
    pytest tests/plugins/memory/test_hindsight_provider.py::TestSharedEventLoopLifecycle -q -o addopts=
    test_shutdown_does_not_stop_shared_event_loop fails with:
    AssertionError: shutdown() swapped out the shared event loop —
    sibling providers would have their aiohttp ClientSession orphaned (#11923)
    

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(hindsight): ...)
  • I searched for existing PRs — no duplicate. (The only PR that matches the naive "aiohttp" search is fix(website): restore Docusaurus startup with webpack 5.106 #9043, a docusaurus/webpack website fix — unrelated.)
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass (see "How to Test" above — full suite passes; the 8 optional-dep collection errors are pre-existing and unrelated)
  • I've added tests for my changes (TestSharedEventLoopLifecycle)
  • I've tested on my platform: macOS 15 / darwin-arm64 / Python 3.12.7

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A (no user-visible config or behavior change beyond "the warning stops happening")
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) — no platform-specific code touched; pure asyncio lifecycle change
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

The module-global `_loop` / `_loop_thread` pair is shared across every
`HindsightMemoryProvider` instance in the process — the plugin loader
creates one provider per `AIAgent`, and the gateway creates one `AIAgent`
per concurrent chat session (Telegram/Discord/Slack/CLI).

`HindsightMemoryProvider.shutdown()` stopped the shared loop when any one
session ended. That stranded the aiohttp `ClientSession` and `TCPConnector`
owned by every sibling provider on a now-dead loop — they were never
reachable for close and surfaced as the `Unclosed client session` /
`Unclosed connector` warnings reported in NousResearch#11923.

Fix: stop stopping the shared loop in `shutdown()`. Per-provider cleanup
still closes that provider's own client via `self._client.aclose()`. The
loop runs on a daemon thread and is reclaimed on process exit; keeping
it alive between provider shutdowns means sibling providers can drain
their own sessions cleanly.

Regression tests in `tests/plugins/memory/test_hindsight_provider.py`
(`TestSharedEventLoopLifecycle`):

- `test_shutdown_does_not_stop_shared_event_loop` — two providers share
  the loop; shutting down one leaves the loop live for the other. This
  test reproduces the NousResearch#11923 leak on `main` and passes with the fix.
- `test_client_aclose_called_on_cloud_mode_shutdown` — each provider's
  own aiohttp session is still closed via `aclose()`.

Fixes NousResearch#11923.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/plugins Plugin system and bundled plugins tool/memory Memory tool and memory providers labels Apr 22, 2026

@nicoloboschi nicoloboschi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

teknium1 added a commit that referenced this pull request Apr 24, 2026
Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs #14109, #13153, #13090.
nekorytaylor666 pushed a commit to nekorytaylor666/hermes-agent that referenced this pull request Apr 24, 2026
justrhoto pushed a commit to justrhoto/hermes-agent that referenced this pull request Apr 24, 2026
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…arch#15070)

Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs NousResearch#14109, NousResearch#13153, NousResearch#13090.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…arch#15070)

Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs NousResearch#14109, NousResearch#13153, NousResearch#13090.
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
…arch#15070)

Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs NousResearch#14109, NousResearch#13153, NousResearch#13090.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…arch#15070)

Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs NousResearch#14109, NousResearch#13153, NousResearch#13090.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…arch#15070)

Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs NousResearch#14109, NousResearch#13153, NousResearch#13090.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…arch#15070)

Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs NousResearch#14109, NousResearch#13153, NousResearch#13090.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P2 Medium — degraded but workaround exists tool/memory Memory tool and memory providers type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Hindsight still leaks aiohttp ClientSession/connector after fix #4762

3 participants