fix(honcho): thread-safe session cache via RLock#13510
Closed
hekaru-agent wants to merge 1 commit into
Closed
Conversation
…ache
- cron/jobs.py: remove_job() now cleans output directory to prevent
orphaned ~/.hermes/cron/output/{job_id}/ dirs accumulating
- plugins/memory/honcho/session.py: protect _cache with RLock; both
main thread and async writer loop access it without synchronization
Lock scope: cache reads/writes only; I/O (peer lookups, Honcho calls)
stays outside lock to avoid serializing expensive operations
Collaborator
1 similar comment
Collaborator
This was referenced Apr 24, 2026
erosika
pushed a commit
to erosika/hermes-agent
that referenced
this pull request
Apr 24, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
erosika
added a commit
to erosika/hermes-agent
that referenced
this pull request
Apr 24, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
erosika
pushed a commit
to erosika/hermes-agent
that referenced
this pull request
Apr 27, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
erosika
added a commit
to erosika/hermes-agent
that referenced
this pull request
Apr 27, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
kshitijk4poor
pushed a commit
that referenced
this pull request
Apr 27, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from #13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
kshitijk4poor
pushed a commit
that referenced
this pull request
Apr 27, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to #13510 (@hekaru-agent).
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
donald131
pushed a commit
to donald131/hermes-agent
that referenced
this pull request
May 2, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
donald131
pushed a commit
to donald131/hermes-agent
that referenced
this pull request
May 2, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
teknium1
pushed a commit
that referenced
this pull request
May 8, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from #13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit dad0217 so this lands the remaining
cron cleanup hunk on its own.
Contributor
|
Thanks @hekaru-agent! Both halves of this PR are now on main:
Both contributions fully absorbed. Appreciate the thread-safety catch. |
RationallyPrime
pushed a commit
to RationallyPrime/hermes-agent
that referenced
this pull request
May 8, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from NousResearch#13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit dad0217 so this lands the remaining
cron cleanup hunk on its own.
JZKK720
pushed a commit
to JZKK720/hermes-agent
that referenced
this pull request
May 11, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from NousResearch#13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit dad0217 so this lands the remaining
cron cleanup hunk on its own.
rmulligan
pushed a commit
to rmulligan/hermes-agent
that referenced
this pull request
May 11, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from NousResearch#13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit dad0217 so this lands the remaining
cron cleanup hunk on its own.
JinyuID
pushed a commit
to JinyuID/hermes-agent
that referenced
this pull request
May 11, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from NousResearch#13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit 6dcd66d so this lands the remaining
cron cleanup hunk on its own.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
jsboige
pushed a commit
to jsboige/hermes-agent
that referenced
this pull request
May 14, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from NousResearch#13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit d418f45 so this lands the remaining
cron cleanup hunk on its own.
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from NousResearch#13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit dad0217 so this lands the remaining
cron cleanup hunk on its own.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from NousResearch#13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit e991cf0 so this lands the remaining
cron cleanup hunk on its own.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
threading.RLock()to protectHonchoSessionManager._cacheaccess against data races between the main thread and the async writer loop.Problem
HonchoSessionManageruses an async writer loop (_async_writer_loop) that writes messages to Honcho's persistence layer, while other threads callget_or_create(),_flush_session(),delete(), andnew_session()— all of which read/writeself._cachewithout synchronization.Python's
dictis not thread-safe for concurrent read-write. The race manifests as:Noneafter write)Fix
All accesses to
self._cacheare now wrapped:get_or_create(cache hit)get_or_create(cache miss — write)_flush_sessionflush_alldeletenew_sessionLock design decision: I/O operations (peer lookups, Honcho API calls) are kept outside the lock to avoid serializing expensive operations. Only the cache read/write is locked. Concurrent cache misses will race on Honcho session creation — but Honcho's own persistence is the source of truth, so this is safe.
Testing
python3 -m py_compile plugins/memory/honcho/session.pyruntime_user_peer_nameparam after our lock change; merged cleanlyFiles changed
plugins/memory/honcho/session.py— RLock addition + 6 synchronized cache access sites