Skip to content

[Bug]: Hindsight sync can race interpreter shutdown after successful one-shot CLI exit #15073

@stepanov1975

Description

@stepanov1975

Bug Description

When Hermes is configured with the Hindsight memory provider, a simple one-shot CLI run can succeed and still emit Hindsight cleanup errors during process exit.

In my case, this reproduces with:

hermes chat -q "Reply with OK only." -Q

The command returns OK, but shutdown still logs:

  • Hindsight sync failed: cannot schedule new futures after interpreter shutdown
  • Unclosed client session
  • sometimes Unclosed connector

This looks distinct from the already-reported long-running gateway leak in #11923: here the failure happens on single-query CLI exit during interpreter teardown.

Before submitting, I checked the closest existing issues/PRs I could find:

I did not find an existing issue specifically for the cannot schedule new futures after interpreter shutdown shutdown race.

Steps to Reproduce

  1. Configure Hermes to use the Hindsight memory provider.
  2. Confirm Hermes is current enough to report Up to date via hermes version.
  3. Run:
    hermes chat -q "Reply with OK only." -Q
  4. Observe that the command prints OK and exits 0.
  5. Inspect ~/.hermes/logs/errors.log.

Expected Behavior

Hermes should fully drain or cancel Hindsight background retention work before interpreter teardown, and process exit should not emit Hindsight warnings or aiohttp resource-leak errors.

Actual Behavior

The CLI succeeds, but exit appends a traceback like this to ~/.hermes/logs/errors.log:

WARNING plugins.memory.hindsight: Hindsight sync failed: cannot schedule new futures after interpreter shutdown
Traceback (most recent call last):
  File "/home/alex/.hermes/hermes-agent/plugins/memory/hindsight/__init__.py", line 920, in _sync
    _run_sync(client.aretain_batch(
  ...
  File ".../asyncio/base_events.py", line 830, in run_in_executor
    executor.submit(func, *args), loop=self)
  File ".../concurrent/futures/thread.py", line 169, in submit
    raise RuntimeError('cannot schedule new futures after interpreter shutdown')
RuntimeError: cannot schedule new futures after interpreter shutdown
ERROR asyncio: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x...>

Fresh repro from today also produced:

2026-04-24 10:30:30,947 WARNING plugins.memory.hindsight: Hindsight sync failed: cannot schedule new futures after interpreter shutdown
2026-04-24 10:30:32,488 ERROR [20260424_103025_10670d] asyncio: Unclosed client session

Affected Component

  • CLI (interactive chat / one-shot CLI)
  • Agent Core (conversation loop, memory shutdown/lifecycle)

Messaging Platform (if gateway-related)

  • N/A (CLI only)

Debug Report

hermes debug share output:

Report     https://paste.rs/Vk9oT
agent.log  https://paste.rs/HJ3qW

Environment

  • Operating System: Ubuntu 24.04.4 LTS
  • Python Version: Hermes runtime Python 3.11.15 (hermes version); system python3 is 3.12.3
  • Hermes Version: Hermes Agent v0.11.0 (2026.4.23), reports Up to date

Additional Logs / Traceback

Relevant current source/log locations:

  • plugins/memory/hindsight/__init__.py:905-933sync_turn() starts a daemon background thread and calls client.aretain_batch() via _run_sync(...)
  • plugins/memory/hindsight/__init__.py:1012-1039shutdown() joins background threads for only 5 seconds, then closes the client and stops the shared event loop
  • ~/.hermes/logs/errors.log:6429-6466 — latest local repro showing the interpreter-shutdown traceback followed by Unclosed client session

Root Cause Analysis

The likely race is:

  1. sync_turn() launches a daemon hindsight-sync thread that performs _run_sync(client.aretain_batch(...)).
  2. Hermes teardown reaches shutdown() late in process exit.
  3. shutdown() only waits up to 5 seconds for self._sync_thread / self._prefetch_thread.
  4. If the retain thread is still active or starts additional async work after interpreter teardown begins, aiohttp eventually reaches run_in_executor(...) and fails with RuntimeError: cannot schedule new futures after interpreter shutdown.
  5. The failed retain path then leaves the aiohttp client session/connector unclosed, causing the follow-on warnings.

I also reproduced the timing problem with a controlled fake-client test locally: shutdown() returned after about 5 seconds while the sync thread was still alive, which strongly suggests the fixed 5-second join is not sufficient to guarantee clean exit.

This appears related to, but not identical with, #11923 / #14109 / #14605. Those focus on shared loop/session cleanup; this variant is specifically about CLI interpreter shutdown racing a still-active background retain thread.

Proposed Fix

A safe fix likely needs one or more of these:

  1. Prevent new background Hindsight retain work from being scheduled once shutdown begins.
  2. Drain or cancel the active retain thread deterministically instead of relying on a daemon thread plus a fixed 5-second join timeout.
  3. Move Hindsight memory shutdown earlier in CLI/session teardown so cleanup completes before Python interpreter shutdown starts.
  4. Add a regression test that runs a one-shot CLI session with Hindsight enabled and asserts that exit does not log cannot schedule new futures after interpreter shutdown, Unclosed client session, or Unclosed connector.

Are you willing to submit a PR for this?

I have a local diagnosis and reproduction and can help test a fix, but I am not attaching a PR with this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/pluginsPlugin system and bundled pluginstool/memoryMemory tool and memory providerstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions