fix(honcho): bug-fix consolidation — 7 upstream PRs, preserved authorship#15381
Conversation
|
Pushed four review follow-ups on top of the 9 adopted commits:
Tests added:
805/805 pass on honcho_plugin + streaming + vision + run_agent + hermes_state. |
a9c62c3 to
5a6a0e2
Compare
|
Pushed two more follow-up commits addressing the 30s timeout architectural critique. Rebased onto main (which includes #15218's Three-layer rework so the 30s default becomes invisible and adaptive:
Tests added (29 new):
Not in this PR (explicitly deferred):
850/850 tests pass on the targeted suite ( |
f512fdf to
5a6a0e2
Compare
5a6a0e2 to
40d7140
Compare
…NousResearch#14984) When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the platform-native user ID as ``runtime_user_peer_name`` into the Honcho session manager. That ID wins over ``peer_name`` in ``honcho.json``, so a single user who connects over three platforms ends up as three separate Honcho peers — one per platform — with fragmented memory and no cross- platform context continuity. For multi-user bots this is correct (and must not change): each user gets their own peer scope. For the vast majority of personal Hermes deployments the configured ``peer_name`` is an unambiguous identity, though, so the reporter asked for an opt-in knob that pins the user peer to that value. Fix: new ``pinPeerName`` boolean on the host config, default ``false``. When ``true`` AND ``peerName`` is set, the configured peer_name beats the gateway's runtime identity; every other resolution case is unchanged. honcho.json: { "peerName": "Igor", "hosts": { "hermes": { "pinPeerName": true } } } session.py (resolution order, pinned case): runtime_user_peer_name → skipped (opt-in flag active) config.peer_name → WINS "Igor" session-key fallback → unreached Parsing follows the same host-block-overrides-root pattern as every other flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper). Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups): - Config parsing: default, root true, host-block true, host overrides root, explicit false. - Peer resolution: runtime wins by default (regression guard for multi- user bots), config wins when pinned, pin-without-peer_name is a no-op (prevents silent peer-id collapse to session-key fallback), CLI path where runtime is absent, deepest fallback intact, assistant peer untouched by the flag. - Cross-platform unification: Telegram UID + Discord snowflake collapse to one peer when pinned; negative control confirms two distinct runtime IDs still produce two peers when unpinned. 244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions. Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name", False)`` so callers building partial config objects (several test fixtures across the codebase do this) don't break if they haven't updated yet. Runtime cost: one attr lookup per new session. Closes NousResearch#14984 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck configs (NousResearch#15162) CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name`` in ``tests/agent/test_memory_user_id.py`` failed after this branch: that test passes a ``MagicMock`` for ``config``, where ``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by default. My ``getattr(..., "pin_peer_name", False)`` fallback was supposed to guard against callers that haven't added the new attr, but MagicMock *does* have the attr — it just returns a live mock for it. Tightened the gate to ``getattr(..., False) is True``. Real configs built via ``HonchoClientConfig.from_global_config`` always yield a proper boolean, so strict equality matches the pinned case and rejects both the unset-attr fallback and MagicMock stand-ins. Added a comment explaining why ``is True`` is intentional, not paranoid. Also tightened the ``peer_name`` existence check to ``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its default (also truthy) doesn't spuriously enable pinning either. Verified against both the new ``test_pin_peer_name.py`` suite (13/13 pass) and the previously-failing ``TestHonchoUserIdScoping`` (3/3 pass). Zero behaviour change for real ``HonchoClientConfig`` values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r limit (NousResearch#13868) Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram supergroup reply chains, Slack thread IDs with long workspace prefixes) can exceed Honcho's 100-character session ID limit after sanitization. Every Honcho API call for those sessions then 400s with "session_id too long". Add a helper that enforces the 100-char limit after sanitization: short keys (the common case) short-circuit unchanged; over-limit keys keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over the original key so two long keys sharing a leading segment can't collide onto the same truncated ID. Adds 7 regression tests in tests/honcho_plugin/test_client.py covering short / exact-limit / long / deterministic / collision-resistant / allowlist-preserving / hash-suffix-present cases.
Wraps _session_cache mutations in threading.RLock. Without this, concurrent gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time) can race on the cache and silently lose conclusions or memory writes. Adopted from NousResearch#13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup hunk from that PR is dropped here for scope isolation. Resolved a small conflict with the pinPeerName guard (kept both).
_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all CLI subcommands (identity --show, status, migrate, etc.) bail with "No API key configured" on self-hosted instances that use baseUrl without an API key. Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the client.py behavior that already handles this case for the SDK. Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).
When no explicit timeout is configured (HonchoClientConfig.timeout, honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client previously constructed the SDK with no timeout kwarg, letting the underlying httpx client hang indefinitely if the Honcho backend became unreachable mid-request. This is a silent-failure hazard on the post-response path of run_conversation: the memory_manager.sync_all() / queue_prefetch_all() calls fire after the agent has already generated its final reply, so a stalled Honcho request blocks run_conversation from returning. The gateway never logs "response ready" and never delivers the response to the platform (Telegram, etc.), even though the text is already saved to the session file. Repro: unplug the network or block app.honcho.dev mid-turn after the model has produced its final message. Without this change, _run_agent never returns. With it, the call aborts after 30s, run_conversation returns, and the gateway delivers the response (Honcho sync failure is logged and swallowed as before). The default applies only when nothing is configured, so any deployment that has explicitly set timeout / HONCHO_TIMEOUT / honcho.timeout / honcho.requestTimeout keeps its existing value. Self-hosted deployments that genuinely need a longer ceiling can still override via any of those knobs.
new_session() was popping the old cached session, releasing the lock, calling get_or_create, then re-acquiring the lock to insert. A concurrent caller could observe the empty-cache window and race-create its own session, producing two divergent session objects for the same key. _cache_lock is an RLock, so nested reacquisition inside get_or_create is safe. Hold it across the whole pop/create/insert sequence. Follow-up to NousResearch#13510 (@hekaru-agent).
sanitize_context() uses a non-greedy block regex that needs both <memory-context> open and close tags present in a single string. When a provider streams the fenced memory block across multiple deltas (typical for recalled-context leaks — the payload often arrives in 10+ 1-80 char chunks), the per-delta sanitize stripped the lone open/close tags via _FENCE_TAG_RE but let the payload in between flow straight to the UI. Adds StreamingContextScrubber: a small stateful scrubber that tracks open/close tag pairs across deltas, holds back partial-tag tails at chunk boundaries, and discards span contents wholesale (including the system-note line that fragments across deltas). Wired into _fire_stream_delta; reset per user turn; benign trailing partial-tag tails are flushed at the end of each model call. Mid-span interruption (provider drops closing tag) drops the orphaned content rather than leaking it — truncated answer > leaked memory. Follow-up to NousResearch#13672 (@dontcallmejames).
…tput fixes NousResearch#5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together)
…local' sentinel Two small follow-ups to the PR review: - Hoist hashlib import from _enforce_session_id_limit() to module top. stdlib imports are free after first cache, but keeping all imports at module top matches the rest of the codebase. - _resolve_api_key now URL-parses baseUrl and requires http/https + non-empty netloc before returning the 'local' sentinel. A typo like baseUrl: 'true' (or bare 'localhost') no longer silently passes the credential guard; the CLI correctly reports 'not configured'. Three new tests cover the new validation (garbage strings, non-http schemes, valid https).
main's 6a957a7 added an optional 'metadata' kwarg to MemoryProvider.on_memory_write so providers can distinguish tool-driven memory writes from background-review writes. MemoryManager already does a getfullargspec-based introspection, so the old 3-arg signature didn't break at runtime — but it missed the origin hint entirely. Updates HonchoMemoryProvider.on_memory_write to accept the kwarg. The metadata isn't yet threaded into Honcho's create_conclusion payload — that's worth its own PR once the consolidation lands and the new metadata shape stabilises.
The scheme-validation commit (e77a3f2c) was too strict: a user with
legacy ''baseUrl: localhost:8000'' (no ''http://'' prefix) in their
''~/.honcho/config.json'' would get ''No API key configured'' from the
CLI after that change, even though their setup worked before.
urlparse on a schemeless host:port treats the host segment as the
scheme and leaves netloc empty, so the http/https check rejected it.
Falls back to a lenient check for schemeless strings that look like
hosts: contain '.' or ':', aren't a boolean/null literal, aren't pure
digits. The SDK still rejects truly malformed URLs at connect time
with a clearer error than ours.
Three new tests: legacy schemeless hosts accepted; obvious garbage
literals (''true'', ''null'', ''12345'') still rejected. Reviewer
noted concern NousResearch#1: schemeless regression for self-hosters with old
configs.
Closed PR NousResearch#5137 addressed the retrieval path (peer cards via get_card() instead of the session-scoped lookup that returned empty for per-session messaging flows) — that architectural fix is already in main as _fetch_peer_card / _fetch_peer_context. What never got fixed is the user-visible side: honcho_profile returning a flat 'No profile facts available yet.' leaves the model to guess at why. The model then often surfaces it to the user as a cryptic error. Adds a diagnostic hint next to the existing 'result' message, enumerating the likely causes in rough order of frequency: 1. Observation disabled for this peer (user_observe_me/others off) 2. Peer card hasn't accumulated yet (fresh peer / dialectic cadence hasn't fired enough turns — cards build over time) 3. Generic fallback: self-hosted Honcho < 3.x lacks peer cards The hint also suggests alternative tools (honcho_reasoning / honcho_search) so the model can route around the empty card rather than giving up. Schema description updated so the model knows the hint field exists and that an empty card is NOT an error state. 7 tests cover the hint paths: warmup, observation-disabled for user + ai, generic fallback, populated card still returns plain result (no hint), alternative-tool suggestion present.
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in NousResearch#15381 so the contributor-attribution CI check passes.
5de4aa2 to
0ed3b75
Compare
kshitijk4poor
left a comment
There was a problem hiding this comment.
Thanks for consolidating these Honcho fixes — the plugin-side work looks useful, and the new StreamingContextScrubber / generic streamed fence cleanup are the right direction.
I'm not comfortable merging the current core-file changes as-is though. The remaining cleanup is small, but I think it should happen before merge:
-
gateway/run.py hardcodes
## Honcho Contexttruncation in a shared gateway path.- Generic
sanitize_context(...)cleanup is fine. - The literal Honcho header split is plugin-specific policy in core code and can also truncate legitimate output if that heading appears naturally.
- Please remove that special-case from the gateway core path, or move any extra Honcho-only cleanup closer to the plugin/provider layer.
- Generic
-
The new context scrubbing is applied very broadly across core user/assistant/replay paths.
- The streaming leak-prevention path makes sense.
- But broad mutation of persisted/replayed user+assistant text risks silently altering legitimate content containing literal
<memory-context>...</memory-context>markers. - Please narrow the scrub surface so we clean known injected internal wrappers, not arbitrary historical/user text globally.
-
run_agent.py appears to strip leading newlines from ordinary streamed deltas.
- That can subtly break markdown/code/list formatting depending on chunk boundaries.
- Please preserve legitimate leading newlines and only manage the intentional paragraph-break insertion logic.
So: the core changes are close, but I'd like one follow-up pass to keep the fix generic and avoid Honcho-specific behavior leaking into shared runtime paths.
Reviewer pushback on the original boundary-hardening commits — three overreach points pulled plugin-specific policy into shared core paths: 1. gateway/run.py hardcoded a '## Honcho Context' literal split for vision-LLM output. Plugin-format heading in framework code; could truncate legitimate output naturally containing that header. Drop the literal split; keep generic sanitize_context (the wrapper strip is plugin-agnostic). Plugin-specific cleanup belongs at the provider boundary, not the shared gateway path. 2. run_agent.run_conversation scrubbed user_message and persist_user_message before the conversation loop. User text is sacred — if a user types a literal <memory-context> tag we must not silently delete it. The producer (build_memory_context_block) is the only legitimate emitter; user input should never need the reverse op. 3. _build_assistant_message scrubbed model output before persistence. Same hazard: would silently mutate legitimate documentation/code the model emits containing the literal markers. The streaming scrubber catches real leaks delta-by-delta before content is concatenated; persist-time scrub was redundant belt-and-suspenders. 4. _fire_stream_delta stripped leading newlines from every delta unless a paragraph break flag was set. Mid-stream '\n' is legitimate markdown — lists, code fences, paragraph breaks — and chunk boundaries are arbitrary. Narrow lstrip to the very first delta of the stream only (so stale provider preamble still gets cleaned on turn start, but mid-stream formatting survives). Plus: build_memory_context_block now logs a warning when its defensive sanitize_context strips something — surfaces buggy providers returning pre-wrapped text instead of silently double-fencing. Net architectural change: scrub surface collapses from 8 sites to 3 (StreamingContextScrubber on output deltas, plugin→backend send, build_memory_context_block input-validation). Plugin-specific strings stay out of shared runtime paths. User input and persisted assistant output are no longer mutated. Tests: rescoped TestMemoryContextSanitization (helper-correctness only, no source-inspection of removed call sites), updated vision tests to drop '## Honcho Context' literal-split assertions, updated _build_assistant_message persistence test to assert preservation. Added: cross-turn scrubber reset, build_memory_context_block warn-on- violation, mid-stream newline preservation (plain + code fence).
Same layering concern as the persisted-assistant scrub already removed: _emit_interim_assistant_message and the final_response return path were mutating model output broadly. Streaming scrubber covers real leaks delta-by-delta; these post-stream scrubs were redundant.
|
Pushed the layering pass. Three asks, addressed individually plus a consistency follow-up: 1. 2. Scrub surface narrowed to known-wrapper boundaries. Removed the broad mutations of user / persisted-assistant text:
The stateful Net surface: down from 8 sites to 3 in core paths (streaming scrubber, plugin→backend send, framework wrapper input-validation). 3. Streaming newline preservation. Plus, while in there: the Tests: 1582 pass across honcho_plugin / run_agent / gateway / agent / hermes_state suites. The one remaining failure is the known Ready for another pass. |
kshitijk4poor
left a comment
There was a problem hiding this comment.
Follow-up looks good. The requested cleanup is addressed:
- the Honcho-specific
## Honcho Contexttruncation is gone fromgateway/run.py - the scrub surface is much narrower, with the streaming scrubber now doing the real runtime leak-prevention work
- streamed leading-newline handling is fixed so mid-stream markdown/code newlines are preserved
I re-checked the core diff and reran targeted coverage on the updated branch:
tests/run_agent/test_run_agent.pytests/run_agent/test_run_agent_codex_responses.pytests/test_hermes_state.pytests/agent/test_streaming_context_scrubber.py
These passed locally on the PR branch (547 passed + 17 passed). The remaining failing full-suite CI checks look unrelated to this Honcho cleanup. Happy with this version.
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in #15381 so the contributor-attribution CI check passes.
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in NousResearch#15381 so the contributor-attribution CI check passes.
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in NousResearch#15381 so the contributor-attribution CI check passes.
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in NousResearch#15381 so the contributor-attribution CI check passes.
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in NousResearch#15381 so the contributor-attribution CI check passes.
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in NousResearch#15381 so the contributor-attribution CI check passes.
Consolidates seven Honcho bug fixes from the open PR queue into one stack. Each upstream PR is a single commit (two for #15162 which itself was two) with original author + date preserved.
Tracked fixes (in cherry-pick order):
Dropped from the original queue as already-shipped or not applicable:
context_tokensbudget already implemented atplugins/memory/honcho/__init__.py:684(_truncate_to_budget). Can be closed upstream.Files touched (9 commits total):
1442 of 1443 tests pass across honcho_plugin + run_agent + hermes_state. The single failure (
tests/run_agent/test_tool_arg_coercion.py::TestCoerceNumber::test_inf_stays_string_for_integer_only) is a preexisting flake on main, unrelated to these changes.Also landed separately and merged earlier (not part of this PR):
run_agent.py:9223. Close issue.run_agent.py:7791. Keep open.