fix(hindsight): drain retain queue cleanly on shutdown#17005
Merged
teknium1 merged 1 commit intoApr 29, 2026
Conversation
Collaborator
nicoloboschi
added a commit
to vectorize-io/hindsight
that referenced
this pull request
Apr 28, 2026
Two improvements to make the smoke test reliable end-to-end:
1. Default model: gpt-4o-mini -> gpt-4o. The smaller model is unreliable
on Hindsight's fact-extraction structured-output schema — the LLM
returns ~163 output tokens but an empty `facts` array, so retain
succeeds (200 OK) yet the bank stays empty and recall surfaces
nothing. gpt-4o produces consistent extractions for this content.
2. Realistic content: replace synthetic Q&A ("What's my favorite
language?" / "The user's favorite language is Rust") with a
first-person user statement, which is what the fact extractor is
tuned to mine.
Also switch the post-retain wait from `_sync_thread.join(60s)` to
`_retain_queue.join()` — the canonical wait-for-drain idiom under the
plugin's new single-writer model (NousResearch/hermes-agent#17005).
ed8965e to
93ca4ef
Compare
The plugin used to spawn one daemon thread per sync_turn() to do the
aretain_batch network write. On CLI exit, that pattern raced interpreter
shutdown — the last retain could reach aiohttp after asyncio's
"cannot schedule new futures" guard had fired, producing noisy logs and
silently losing the final unsaved turn:
WARNING ... Hindsight sync failed: cannot schedule new futures after
interpreter shutdown
ERROR asyncio: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x...>
Switch to a single-writer model: each provider owns one long-lived
writer thread plus a queue. sync_turn() snapshots state and enqueues a
job; the writer drains sequentially. Once shutdown() is called:
- new sync_turn() / queue_prefetch() calls are dropped, not enqueued
- a sentinel wakes the writer so it finishes in-flight work
- shutdown joins the writer (10s) before nulling the client
Also register an idempotent atexit hook from the first sync_turn(), so
exit paths that don't go through MemoryManager.shutdown_all() (Ctrl-C,
abrupt exit) still get a chance to drain.
Tests: keep _sync_thread as a legacy alias to the writer, swap join()
calls to _retain_queue.join() (canonical wait-for-drain), add a new
TestShutdownRace suite covering single-writer reuse, post-shutdown drop,
queue draining, and shutdown idempotency.
93ca4ef to
8b9e948
Compare
2 tasks
Contributor
Author
|
Consolidated into #17447 (same fix + the session-switch buffer-flush + prefetch-clear cousins, all sharing the writer queue surface). Closing. |
This was referenced Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hindsight memory works during normal Hermes runtime, but on CLI / session exit the plugin's background retain path can race interpreter shutdown:
Mostly exit-time log noise, but it can also silently drop the last unsaved turn — typically the most important one.
Root cause
sync_turn()previously spawned one ad-hoc daemon thread per call to do thearetain_batchnetwork write. When that thread is still mid-flight while interpreter shutdown begins, asyncio's "cannot schedule new futures" guard fires and aiohttp's session is never cleanly closed.Fix
Per-provider single-writer model:
queue.Queueper provider.sync_turn()snapshots state and enqueues a callable; no thread spawn per call.queue_prefetch()andsync_turn()early-return onceshutdown()has fired — "once shutdown starts, stop accepting new writes."shutdown()sets the gate, sends a sentinel, joins the writer (10s) so in-flight work drains, then closes the client.atexithook registered from the firstsync_turn()so exit paths that don't go throughMemoryManager.shutdown_all()(Ctrl-C, abrupt exit) still get a chance to drain._sync_threadis kept as a legacy alias pointing at the writer for any external callers; tests use_retain_queue.join()(the canonical wait-for-drain idiom) instead.Test plan
uv run pytest tests/plugins/memory/test_hindsight_provider.py— 86/86 passeduv run pytest tests/agent/test_memory_provider.py tests/plugins/memory/— 194/194 passedTestShutdownRacesuite covers:sync_turn()calls (no per-call spawn)sync_turnaftershutdown()is dropped (no enqueue, no client call)queue_prefetchaftershutdown()is droppedshutdown()drains pending retains before returningshutdown()is idempotent