Bug Description
create_conclusion() in the Honcho plugin intermittently fails with no error propagated to the user. The root cause is a race condition between the sync_turn background thread and the main thread accessing HonchoSessionManager._cache (a plain dict) without any synchronization.
Steps to Reproduce
- Enable Honcho memory plugin with
writeFrequency: "async"
- Make rapid consecutive
honcho_conclude tool calls (e.g., 10+ in a row)
- Some calls return
"Failed to save conclusion" at random positions
- Retrying the exact same text succeeds (timing-dependent)
- Direct Honcho SDK calls work fine for the same content
Root Cause
_cache in session.py:94 is an unprotected dict accessed from multiple threads:
- Main thread (tool calls):
handle_tool_call → create_conclusion → self._cache.get(session_key) (session.py:916)
- Background thread (
sync_turn): spawned at __init__.py:569, calls self._manager.get_or_create(self._session_key) (line 559) which reads/writes _cache
- Background thread (
on_memory_write): spawned at __init__.py:589, calls create_conclusion which reads _cache
get_or_create() is not atomic — it checks membership, creates objects, then assigns to _cache (session.py:234-284). If the sync thread is mid-operation, the main thread's _cache.get() can miss the entry.
While CPython's GIL makes individual dict operations atomic, the multi-step read-modify-write in get_or_create() is not protected, leading to intermittent cache misses.
Impact
High — silent knowledge loss. Conclusions (user facts, preferences, corrections) are dropped without any indication to the user. This also affects:
on_memory_write() which mirrors built-in memory writes as conclusions (same race)
sync_turn() which records conversation turns
- Any operation depending on
_cache.get(session_key) succeeding
This undermines the core value proposition of the Honcho plugin as a long-term memory provider.
Proposed Fix
Add a threading.Lock to protect _cache access in HonchoSessionManager:
# session.py __init__
self._cache_lock = threading.Lock()
# All _cache reads/writes wrapped:
with self._cache_lock:
if key in self._cache:
return self._cache[key]
Alternatively, join the sync_thread before processing tool calls in handle_tool_call().
Environment
- Hermes Agent v0.7.0
- honcho-ai SDK (latest)
- Gateway mode (Telegram platform)
writeFrequency: "async" in honcho.json
Bug Description
create_conclusion()in the Honcho plugin intermittently fails with no error propagated to the user. The root cause is a race condition between thesync_turnbackground thread and the main thread accessingHonchoSessionManager._cache(a plaindict) without any synchronization.Steps to Reproduce
writeFrequency: "async"honcho_concludetool calls (e.g., 10+ in a row)"Failed to save conclusion"at random positionsRoot Cause
_cacheinsession.py:94is an unprotecteddictaccessed from multiple threads:handle_tool_call→create_conclusion→self._cache.get(session_key)(session.py:916)sync_turn): spawned at__init__.py:569, callsself._manager.get_or_create(self._session_key)(line 559) which reads/writes_cacheon_memory_write): spawned at__init__.py:589, callscreate_conclusionwhich reads_cacheget_or_create()is not atomic — it checks membership, creates objects, then assigns to_cache(session.py:234-284). If the sync thread is mid-operation, the main thread's_cache.get()can miss the entry.While CPython's GIL makes individual dict operations atomic, the multi-step read-modify-write in
get_or_create()is not protected, leading to intermittent cache misses.Impact
High — silent knowledge loss. Conclusions (user facts, preferences, corrections) are dropped without any indication to the user. This also affects:
on_memory_write()which mirrors built-in memory writes as conclusions (same race)sync_turn()which records conversation turns_cache.get(session_key)succeedingThis undermines the core value proposition of the Honcho plugin as a long-term memory provider.
Proposed Fix
Add a
threading.Lockto protect_cacheaccess inHonchoSessionManager:Alternatively, join the
sync_threadbefore processing tool calls inhandle_tool_call().Environment
writeFrequency: "async"in honcho.json