feat(honcho): adaptive HTTP timeout + circuit breaker + sync worker (draft)#15404
Closed
erosika wants to merge 15 commits into
Closed
feat(honcho): adaptive HTTP timeout + circuit breaker + sync worker (draft)#15404erosika wants to merge 15 commits into
erosika wants to merge 15 commits into
Conversation
…NousResearch#14984) When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the platform-native user ID as ``runtime_user_peer_name`` into the Honcho session manager. That ID wins over ``peer_name`` in ``honcho.json``, so a single user who connects over three platforms ends up as three separate Honcho peers — one per platform — with fragmented memory and no cross- platform context continuity. For multi-user bots this is correct (and must not change): each user gets their own peer scope. For the vast majority of personal Hermes deployments the configured ``peer_name`` is an unambiguous identity, though, so the reporter asked for an opt-in knob that pins the user peer to that value. Fix: new ``pinPeerName`` boolean on the host config, default ``false``. When ``true`` AND ``peerName`` is set, the configured peer_name beats the gateway's runtime identity; every other resolution case is unchanged. honcho.json: { "peerName": "Igor", "hosts": { "hermes": { "pinPeerName": true } } } session.py (resolution order, pinned case): runtime_user_peer_name → skipped (opt-in flag active) config.peer_name → WINS "Igor" session-key fallback → unreached Parsing follows the same host-block-overrides-root pattern as every other flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper). Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups): - Config parsing: default, root true, host-block true, host overrides root, explicit false. - Peer resolution: runtime wins by default (regression guard for multi- user bots), config wins when pinned, pin-without-peer_name is a no-op (prevents silent peer-id collapse to session-key fallback), CLI path where runtime is absent, deepest fallback intact, assistant peer untouched by the flag. - Cross-platform unification: Telegram UID + Discord snowflake collapse to one peer when pinned; negative control confirms two distinct runtime IDs still produce two peers when unpinned. 244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions. Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name", False)`` so callers building partial config objects (several test fixtures across the codebase do this) don't break if they haven't updated yet. Runtime cost: one attr lookup per new session. Closes NousResearch#14984 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck configs (NousResearch#15162) CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name`` in ``tests/agent/test_memory_user_id.py`` failed after this branch: that test passes a ``MagicMock`` for ``config``, where ``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by default. My ``getattr(..., "pin_peer_name", False)`` fallback was supposed to guard against callers that haven't added the new attr, but MagicMock *does* have the attr — it just returns a live mock for it. Tightened the gate to ``getattr(..., False) is True``. Real configs built via ``HonchoClientConfig.from_global_config`` always yield a proper boolean, so strict equality matches the pinned case and rejects both the unset-attr fallback and MagicMock stand-ins. Added a comment explaining why ``is True`` is intentional, not paranoid. Also tightened the ``peer_name`` existence check to ``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its default (also truthy) doesn't spuriously enable pinning either. Verified against both the new ``test_pin_peer_name.py`` suite (13/13 pass) and the previously-failing ``TestHonchoUserIdScoping`` (3/3 pass). Zero behaviour change for real ``HonchoClientConfig`` values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r limit (NousResearch#13868) Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram supergroup reply chains, Slack thread IDs with long workspace prefixes) can exceed Honcho's 100-character session ID limit after sanitization. Every Honcho API call for those sessions then 400s with "session_id too long". Add a helper that enforces the 100-char limit after sanitization: short keys (the common case) short-circuit unchanged; over-limit keys keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over the original key so two long keys sharing a leading segment can't collide onto the same truncated ID. Adds 7 regression tests in tests/honcho_plugin/test_client.py covering short / exact-limit / long / deterministic / collision-resistant / allowlist-preserving / hash-suffix-present cases.
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all CLI subcommands (identity --show, status, migrate, etc.) bail with "No API key configured" on self-hosted instances that use baseUrl without an API key. Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the client.py behavior that already handles this case for the SDK. Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).
When no explicit timeout is configured (HonchoClientConfig.timeout, honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client previously constructed the SDK with no timeout kwarg, letting the underlying httpx client hang indefinitely if the Honcho backend became unreachable mid-request. This is a silent-failure hazard on the post-response path of run_conversation: the memory_manager.sync_all() / queue_prefetch_all() calls fire after the agent has already generated its final reply, so a stalled Honcho request blocks run_conversation from returning. The gateway never logs "response ready" and never delivers the response to the platform (Telegram, etc.), even though the text is already saved to the session file. Repro: unplug the network or block app.honcho.dev mid-turn after the model has produced its final message. Without this change, _run_agent never returns. With it, the call aborts after 30s, run_conversation returns, and the gateway delivers the response (Honcho sync failure is logged and swallowed as before). The default applies only when nothing is configured, so any deployment that has explicitly set timeout / HONCHO_TIMEOUT / honcho.timeout / honcho.requestTimeout keeps its existing value. Self-hosted deployments that genuinely need a longer ceiling can still override via any of those knobs.
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
sanitize_context() uses a non-greedy block regex that needs both <memory-context> open and close tags present in a single string. When a provider streams the fenced memory block across multiple deltas (typical for recalled-context leaks — the payload often arrives in 10+ 1-80 char chunks), the per-delta sanitize stripped the lone open/close tags via _FENCE_TAG_RE but let the payload in between flow straight to the UI. Adds StreamingContextScrubber: a small stateful scrubber that tracks open/close tag pairs across deltas, holds back partial-tag tails at chunk boundaries, and discards span contents wholesale (including the system-note line that fragments across deltas). Wired into _fire_stream_delta; reset per user turn; benign trailing partial-tag tails are flushed at the end of each model call. Mid-span interruption (provider drops closing tag) drops the orphaned content rather than leaking it — truncated answer > leaked memory. Follow-up to NousResearch#13672 (@dontcallmejames).
…tput fixes NousResearch#5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together)
…local' sentinel Two small follow-ups to the PR review: - Hoist hashlib import from _enforce_session_id_limit() to module top. stdlib imports are free after first cache, but keeping all imports at module top matches the rest of the codebase. - _resolve_api_key now URL-parses baseUrl and requires http/https + non-empty netloc before returning the 'local' sentinel. A typo like baseUrl: 'true' (or bare 'localhost') no longer silently passes the credential guard; the CLI correctly reports 'not configured'. Three new tests cover the new validation (garbage strings, non-http schemes, valid https).
…itives
Foundational building blocks for the Honcho sync path rework. Provider
integration lands in a follow-up commit; this commit is standalone unit-
tested primitives that expose clean seams for the integration tests:
SyncWorker
Persistent daemon thread draining a bounded task queue. Replaces
the per-turn threading.Thread(target=_sync).start() pattern so
sync_turn() returns immediately, never coordinating thread handoff
on the user-facing path. Queue overflow drops the OLDEST task
(with on_failure callback) rather than blocking the producer —
preserves responsiveness under load.
HonchoLatencyTracker
Rolling p95 observer with a warmup phase (returns default until
N samples collected) and a floor. timeout() = max(floor, headroom
* p95). Hosted Honcho settles to ~1-3s; self-hosted cold starts
scale up naturally. No hardcoded 30s ceiling visible to users.
CircuitBreaker
Closed → Open after N consecutive failures. Probe interval lets
Open → HalfOpen one request through; success closes, failure
reopens. Thread-safe. Time source is injectable for determinism
in tests.
24 tests covering all three primitives plus their integration (worker
feeds latency observations to the tracker and success/failure to the
breaker; breaker-open enqueue drops the task via on_failure).
No behavioural change to runtime yet — nothing in the codebase
imports these primitives in this commit.
…r into provider
Replaces the per-turn threading.Thread(target=_sync).start() pattern in
HonchoMemoryProvider with a persistent SyncWorker. sync_turn() and
on_memory_write() both enqueue SyncTasks on the shared worker and return
immediately — run_conversation's post-response path is no longer coupled
to Honcho latency.
Three behavioural changes land here:
Layer 1 — fire-and-forget sync
No more join(timeout=5.0) on prior turn's thread. Back-to-back
sync_turn() calls return in microseconds regardless of backend
latency. Worker runs tasks serially per-provider (intentional:
session writes must be ordered), uses a bounded queue with
oldest-drop backpressure.
Layer 2 — adaptive timeout
SyncWorker feeds successful call latencies into HonchoLatencyTracker.
After each turn, _drain_backlog_if_healthy() invokes
rebuild_honcho_client_with_timeout() which rebuilds the SDK client
iff the tracker's p95-derived timeout differs >20% from the active
one. Hosted Honcho converges on ~1-3s timeouts; self-hosted cold
starts scale naturally. 30s default still applies during warmup.
Layer 3 — circuit breaker + in-memory backlog
CircuitBreaker trips open after 3 consecutive failures; SyncWorker
refuses breaker-open tasks via their on_failure callback. Provider
wraps each task's on_failure with _enqueue_with_backlog() so
breaker-open and queue-full tasks land in a bounded backlog (256
tasks max). On recovery (probe succeeds, state → closed), the next
sync_turn() drains the backlog through the worker. Tasks that
crashed inside Honcho itself are NOT backlogged — replay won't help.
Updates one existing test (test_session.py) that poked at the now-
removed _sync_thread attribute; replaced with the worker's shutdown().
5 new integration tests verify the provider-level wiring:
- sync_turn returns in < 100ms even when flush blocks 2s
- 5 back-to-back sync_turns in < 200ms total (old code: up to 25s)
- breaker-open enqueue lands in backlog, not on the worker
- recovery drains backlog + new task on next sync_turn
- backlog respects _BACKLOG_MAX and stops growing during long outages
No change to run_conversation or any agent-facing API.
This was referenced May 26, 2026
Contributor
Author
|
Closing this old stacked draft now that #15381 landed. The useful SyncWorker/circuit-breaker pieces should be salvaged later as a clean reliability PR rather than keeping this stale conflicted stack open. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft — stacked on top of #15381. Until that merges this PR's diff will show all 15 commits; the 2 new ones (top of the stack) are the actual content. Retarget or rebase once #15381 lands.
Layered timeout-ceiling rework for the Honcho sync path, cherry-picked out of #15381 so the consolidation can ship on its own merits without the architectural debate over this piece.
The two commits
feat(honcho): SyncWorker + HonchoLatencyTracker + CircuitBreaker primitives— 24 unit tests, no runtime changefeat(honcho): wire fire-and-forget worker + adaptive timeout + breaker into provider— replaces per-turn thread spawn with the persistent worker, feeds latencies into the tracker, trips the breaker on consecutive failures, adaptive client rebuild viarebuild_honcho_client_with_timeout(), in-memory backlog of 256 tasks (5 integration tests)Known concerns to resolve before un-drafting
In-memory backlog is not durable. A process crash during an outage loses every backlogged write. The old per-turn-thread pattern had independent threads, so some would complete before the crash. As-shipped this is a regression for durability in the crash case. Options:
hermes_state.py(existing SQLite store) for restart-safe replayAdaptive-rebuild runs only when healthy.
_drain_backlog_if_healthy()early-returns when the breaker is open, so the timeout never shrinks during an actual outage — backwards from the motivation. Either move the rebuild outside the breaker gate, or drop Layer 2's client-rebuild machinery and keep the tracker as a passive observer only (e.g.,hermes honcho statussurface).Singleton rebuild throws away connection pool. The 20% threshold (
ratio < 0.8 or > 1.2) is magic. Each rebuild costs a TCP+TLS handshake on the next call. Not clear the complexity is worth it vs. a static but sensible default.What's worth keeping regardless
SyncWorker+ persistent queue) is the real UX win. The 30s timeout default becomes invisible because the user never waits on Honcho at all.sync_turnreturns in < 100ms with a 2s-blocking backend (tested).CircuitBreakeras a passive tripwire that skips sync during outages is straightforward and defensible.HonchoLatencyTrackeris useful for observability even if we drop the auto-rebuild.Opening as draft so the architectural decisions can be debated without blocking #15381.