feat(honcho): adaptive HTTP timeout + circuit breaker + sync worker (draft) by erosika · Pull Request #15404 · NousResearch/hermes-agent

erosika · 2026-04-24T23:05:33Z

Draft — stacked on top of #15381. Until that merges this PR's diff will show all 15 commits; the 2 new ones (top of the stack) are the actual content. Retarget or rebase once #15381 lands.

Layered timeout-ceiling rework for the Honcho sync path, cherry-picked out of #15381 so the consolidation can ship on its own merits without the architectural debate over this piece.

The two commits

feat(honcho): SyncWorker + HonchoLatencyTracker + CircuitBreaker primitives — 24 unit tests, no runtime change
feat(honcho): wire fire-and-forget worker + adaptive timeout + breaker into provider — replaces per-turn thread spawn with the persistent worker, feeds latencies into the tracker, trips the breaker on consecutive failures, adaptive client rebuild via rebuild_honcho_client_with_timeout(), in-memory backlog of 256 tasks (5 integration tests)

Known concerns to resolve before un-drafting

In-memory backlog is not durable. A process crash during an outage loses every backlogged write. The old per-turn-thread pattern had independent threads, so some would complete before the crash. As-shipped this is a regression for durability in the crash case. Options:
- Drop the backlog entirely (breaker-open = skip sync, accept loss)
- Persist to hermes_state.py (existing SQLite store) for restart-safe replay
- Keep in-memory + warn when dropping; document the crash-window loss
Adaptive-rebuild runs only when healthy. _drain_backlog_if_healthy() early-returns when the breaker is open, so the timeout never shrinks during an actual outage — backwards from the motivation. Either move the rebuild outside the breaker gate, or drop Layer 2's client-rebuild machinery and keep the tracker as a passive observer only (e.g., hermes honcho status surface).
Singleton rebuild throws away connection pool. The 20% threshold (ratio < 0.8 or > 1.2) is magic. Each rebuild costs a TCP+TLS handshake on the next call. Not clear the complexity is worth it vs. a static but sensible default.

What's worth keeping regardless

Layer 1 (SyncWorker + persistent queue) is the real UX win. The 30s timeout default becomes invisible because the user never waits on Honcho at all. sync_turn returns in < 100ms with a 2s-blocking backend (tested).
CircuitBreaker as a passive tripwire that skips sync during outages is straightforward and defensible.
HonchoLatencyTracker is useful for observability even if we drop the auto-rebuild.

Opening as draft so the architectural decisions can be debated without blocking #15381.

…NousResearch#14984) When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the platform-native user ID as ``runtime_user_peer_name`` into the Honcho session manager. That ID wins over ``peer_name`` in ``honcho.json``, so a single user who connects over three platforms ends up as three separate Honcho peers — one per platform — with fragmented memory and no cross- platform context continuity. For multi-user bots this is correct (and must not change): each user gets their own peer scope. For the vast majority of personal Hermes deployments the configured ``peer_name`` is an unambiguous identity, though, so the reporter asked for an opt-in knob that pins the user peer to that value. Fix: new ``pinPeerName`` boolean on the host config, default ``false``. When ``true`` AND ``peerName`` is set, the configured peer_name beats the gateway's runtime identity; every other resolution case is unchanged. honcho.json: { "peerName": "Igor", "hosts": { "hermes": { "pinPeerName": true } } } session.py (resolution order, pinned case): runtime_user_peer_name → skipped (opt-in flag active) config.peer_name → WINS "Igor" session-key fallback → unreached Parsing follows the same host-block-overrides-root pattern as every other flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper). Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups): - Config parsing: default, root true, host-block true, host overrides root, explicit false. - Peer resolution: runtime wins by default (regression guard for multi- user bots), config wins when pinned, pin-without-peer_name is a no-op (prevents silent peer-id collapse to session-key fallback), CLI path where runtime is absent, deepest fallback intact, assistant peer untouched by the flag. - Cross-platform unification: Telegram UID + Discord snowflake collapse to one peer when pinned; negative control confirms two distinct runtime IDs still produce two peers when unpinned. 244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions. Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name", False)`` so callers building partial config objects (several test fixtures across the codebase do this) don't break if they haven't updated yet. Runtime cost: one attr lookup per new session. Closes NousResearch#14984 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ck configs (NousResearch#15162) CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name`` in ``tests/agent/test_memory_user_id.py`` failed after this branch: that test passes a ``MagicMock`` for ``config``, where ``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by default. My ``getattr(..., "pin_peer_name", False)`` fallback was supposed to guard against callers that haven't added the new attr, but MagicMock *does* have the attr — it just returns a live mock for it. Tightened the gate to ``getattr(..., False) is True``. Real configs built via ``HonchoClientConfig.from_global_config`` always yield a proper boolean, so strict equality matches the pinned case and rejects both the unset-attr fallback and MagicMock stand-ins. Added a comment explaining why ``is True`` is intentional, not paranoid. Also tightened the ``peer_name`` existence check to ``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its default (also truthy) doesn't spuriously enable pinning either. Verified against both the new ``test_pin_peer_name.py`` suite (13/13 pass) and the previously-failing ``TestHonchoUserIdScoping`` (3/3 pass). Zero behaviour change for real ``HonchoClientConfig`` values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…r limit (NousResearch#13868) Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram supergroup reply chains, Slack thread IDs with long workspace prefixes) can exceed Honcho's 100-character session ID limit after sanitization. Every Honcho API call for those sessions then 400s with "session_id too long". Add a helper that enforces the 100-char limit after sanitization: short keys (the common case) short-circuit unchanged; over-limit keys keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over the original key so two long keys sharing a leading segment can't collide onto the same truncated ID. Adds 7 regression tests in tests/honcho_plugin/test_client.py covering short / exact-limit / long / deterministic / collision-resistant / allowlist-preserving / hash-suffix-present cases.

@hekaru-agent

Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).

_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all CLI subcommands (identity --show, status, migrate, etc.) bail with "No API key configured" on self-hosted instances that use baseUrl without an API key. Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the client.py behavior that already handles this case for the SDK. Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).

When no explicit timeout is configured (HonchoClientConfig.timeout, honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client previously constructed the SDK with no timeout kwarg, letting the underlying httpx client hang indefinitely if the Honcho backend became unreachable mid-request. This is a silent-failure hazard on the post-response path of run_conversation: the memory_manager.sync_all() / queue_prefetch_all() calls fire after the agent has already generated its final reply, so a stalled Honcho request blocks run_conversation from returning. The gateway never logs "response ready" and never delivers the response to the platform (Telegram, etc.), even though the text is already saved to the session file. Repro: unplug the network or block app.honcho.dev mid-turn after the model has produced its final message. Without this change, _run_agent never returns. With it, the call aborts after 30s, run_conversation returns, and the gateway delivers the response (Honcho sync failure is logged and swallowed as before). The default applies only when nothing is configured, so any deployment that has explicitly set timeout / HONCHO_TIMEOUT / honcho.timeout / honcho.requestTimeout keeps its existing value. Self-hosted deployments that genuinely need a longer ceiling can still override via any of those knobs.

@hekaru-agent

new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).

@dontcallmejames

sanitize_context() uses a non-greedy block regex that needs both <memory-context> open and close tags present in a single string. When a provider streams the fenced memory block across multiple deltas (typical for recalled-context leaks — the payload often arrives in 10+ 1-80 char chunks), the per-delta sanitize stripped the lone open/close tags via _FENCE_TAG_RE but let the payload in between flow straight to the UI. Adds StreamingContextScrubber: a small stateful scrubber that tracks open/close tag pairs across deltas, holds back partial-tag tails at chunk boundaries, and discards span contents wholesale (including the system-note line that fragments across deltas). Wired into _fire_stream_delta; reset per user turn; benign trailing partial-tag tails are flushed at the end of each model call. Mid-span interruption (provider drops closing tag) drops the orphaned content rather than leaking it — truncated answer > leaked memory. Follow-up to NousResearch#13672 (@dontcallmejames).

…tput fixes NousResearch#5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together)

…local' sentinel Two small follow-ups to the PR review: - Hoist hashlib import from _enforce_session_id_limit() to module top. stdlib imports are free after first cache, but keeping all imports at module top matches the rest of the codebase. - _resolve_api_key now URL-parses baseUrl and requires http/https + non-empty netloc before returning the 'local' sentinel. A typo like baseUrl: 'true' (or bare 'localhost') no longer silently passes the credential guard; the CLI correctly reports 'not configured'. Three new tests cover the new validation (garbage strings, non-http schemes, valid https).

…itives Foundational building blocks for the Honcho sync path rework. Provider integration lands in a follow-up commit; this commit is standalone unit- tested primitives that expose clean seams for the integration tests: SyncWorker Persistent daemon thread draining a bounded task queue. Replaces the per-turn threading.Thread(target=_sync).start() pattern so sync_turn() returns immediately, never coordinating thread handoff on the user-facing path. Queue overflow drops the OLDEST task (with on_failure callback) rather than blocking the producer — preserves responsiveness under load. HonchoLatencyTracker Rolling p95 observer with a warmup phase (returns default until N samples collected) and a floor. timeout() = max(floor, headroom * p95). Hosted Honcho settles to ~1-3s; self-hosted cold starts scale up naturally. No hardcoded 30s ceiling visible to users. CircuitBreaker Closed → Open after N consecutive failures. Probe interval lets Open → HalfOpen one request through; success closes, failure reopens. Thread-safe. Time source is injectable for determinism in tests. 24 tests covering all three primitives plus their integration (worker feeds latency observations to the tracker and success/failure to the breaker; breaker-open enqueue drops the task via on_failure). No behavioural change to runtime yet — nothing in the codebase imports these primitives in this commit.

…r into provider Replaces the per-turn threading.Thread(target=_sync).start() pattern in HonchoMemoryProvider with a persistent SyncWorker. sync_turn() and on_memory_write() both enqueue SyncTasks on the shared worker and return immediately — run_conversation's post-response path is no longer coupled to Honcho latency. Three behavioural changes land here: Layer 1 — fire-and-forget sync No more join(timeout=5.0) on prior turn's thread. Back-to-back sync_turn() calls return in microseconds regardless of backend latency. Worker runs tasks serially per-provider (intentional: session writes must be ordered), uses a bounded queue with oldest-drop backpressure. Layer 2 — adaptive timeout SyncWorker feeds successful call latencies into HonchoLatencyTracker. After each turn, _drain_backlog_if_healthy() invokes rebuild_honcho_client_with_timeout() which rebuilds the SDK client iff the tracker's p95-derived timeout differs >20% from the active one. Hosted Honcho converges on ~1-3s timeouts; self-hosted cold starts scale naturally. 30s default still applies during warmup. Layer 3 — circuit breaker + in-memory backlog CircuitBreaker trips open after 3 consecutive failures; SyncWorker refuses breaker-open tasks via their on_failure callback. Provider wraps each task's on_failure with _enqueue_with_backlog() so breaker-open and queue-full tasks land in a bounded backlog (256 tasks max). On recovery (probe succeeds, state → closed), the next sync_turn() drains the backlog through the worker. Tasks that crashed inside Honcho itself are NOT backlogged — replay won't help. Updates one existing test (test_session.py) that poked at the now- removed _sync_thread attribute; replaced with the worker's shutdown(). 5 new integration tests verify the provider-level wiring: - sync_turn returns in < 100ms even when flush blocks 2s - 5 back-to-back sync_turns in < 200ms total (old code: up to 25s) - breaker-open enqueue lands in backlog, not on the worker - recovery drains backlog + new task on next sync_turn - backlog respects _BACKLOG_MAX and stops growing during long outages No change to run_conversation or any agent-facing API.

erosika · 2026-05-28T16:12:26Z

Closing this old stacked draft now that #15381 landed. The useful SyncWorker/circuit-breaker pieces should be salvaged later as a clean reliability PR rather than keeping this stale conflicted stack open.

briandevans and others added 15 commits April 24, 2026 18:48

fix: strip leaked memory context from commentary

3ce4767

fix: harden memory-context leak boundaries

936ac82

Fix Honcho HOME-aware global config fallback

ab0d8c9

erosika mentioned this pull request Apr 24, 2026

fix(honcho): bug-fix consolidation — 7 upstream PRs, preserved authorship #15381

Merged

This was referenced May 26, 2026

Honcho PR map: cleanup pass #32863

Open

fix(honcho): harden self-hosted setup paths #33954

Closed

erosika closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(honcho): adaptive HTTP timeout + circuit breaker + sync worker (draft)#15404

feat(honcho): adaptive HTTP timeout + circuit breaker + sync worker (draft)#15404
erosika wants to merge 15 commits into
NousResearch:mainfrom
erosika:feat/honcho-adaptive-timeout-breaker

erosika commented Apr 24, 2026

Uh oh!

erosika commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

erosika commented Apr 24, 2026

The two commits

Known concerns to resolve before un-drafting

What's worth keeping regardless

Uh oh!

erosika commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants