feat(session_search): toolkit of fast / guided / summary modes for low-cost, high-fidelity recall#26419
Closed
yoniebans wants to merge 30 commits into
Closed
feat(session_search): toolkit of fast / guided / summary modes for low-cost, high-fidelity recall#26419yoniebans wants to merge 30 commits into
yoniebans wants to merge 30 commits into
Conversation
Add mode parameter to session_search tool supporting two modes: - fast (default): returns FTS5 snippets + context immediately (~0.02s), no LLM call — ideal for quick recall lookups - summary: preserves original behavior with LLM-generated session summaries (~10-30s) — use when fast mode is insufficient Changes: - tools/session_search_tool.py: implement fast mode path that returns FTS hits with snippets/context without calling auxiliary model; add mode parameter to schema (enum: fast|summary); apply parent session source/metadata resolution in fast mode (same pattern as upstream fix 6b4ccb9 in summary mode) - run_agent.py: pass mode argument from function_args in two call sites (direct tool call + subagent path) - tests/tools/test_session_search.py: add test coverage for fast mode output format, summary mode preservation, backwards compatibility, and run_agent.py mode forwarding verification The tool schema description is updated to recommend fast-first usage.
abcdjmm970703@gmail.com → JabberELF for the session_search fast/summary dual-mode salvage.
…pt-in Reverses the default introduced by the salvaged dual-mode commit. Why: profiled four representative queries against a real 280-session state.db (workspace harness, not committed). Summary mode is 1,299x-6,293x slower than fast (median ~30s vs ~10ms; 99%+ in the auxiliary LLM call) and produces 2.9x-3.9x larger result blobs, but it answers a materially different question. The user's typical 'what did we work on for X?' is the summary question — fast surfaces only what FTS5 directly matched while summary surfaces cross-session synthesis (e.g. work sessions referenced inside the matched cron jobs). Backwards-compatible default; fast remains opt-in for cheap discovery via mode='fast'. Changes: - tools/session_search_tool.py: default parameter, defensive coercion fallbacks, and registry handler all default to 'summary'. Schema description rewritten with measured trade-offs and the 'use fast for discovery, summary for recall' framing. - run_agent.py: both direct call sites mirror the new default. - tests/tools/test_session_search.py: split the old default-test into test_default_search_returns_summary_mode_recap (asserts new default) and test_explicit_fast_mode_returns_snippets... (covers fast path without mocking the default away). Invalid-mode test now asserts fallback to summary. Source-grep test updated.
Adds SessionDB.get_messages_around(session_id, around_message_id, window) which returns up to 'window' messages before the anchor, the anchor itself, and up to 'window' after — all from the same session, ordered by id ascending. Used by the upcoming session_search mode='guided' (anchored drill-down) to surface a focused conversation window without summarisation cost or the 100k-char truncation gamble of mode='summary'. Boundaries are honoured (fewer messages at session start/end), the anchor is verified to exist in the named session before fetching (cheap guard against cross-session id confusion), and content/tool_calls decoding mirrors get_messages() so callers can swap between the two without surprises. Tested: 9 new cases in tests/hermes_state/test_get_messages_around.py (middle-of-session, first-message, last-message, anchor-not-in-session, no cross-session leakage, window > session, window=0, window negative, content decoding parity with get_messages). 62/62 passing including the existing 53 session_search tests.
Adds 'match_message_id' to each fast-mode result entry, carrying through the FTS5 message id (already populated in the underlying search_messages result; just unsurfaced until now). This is the composition handle for the upcoming mode='guided' drill-down: the calling agent reads a fast hit, picks a promising session, and passes session_id + match_message_id back as around_message_id for an anchored window. Lossless for non-guided callers (additive field, no schema changes). One new test (test_fast_mode_includes_match_message_id_for_guided_drilldown). 63/63 passing.
Adds a third mode to session_search: guided returns a window of messages
around a specific message id in a specific session. No FTS5, no
auxiliary LLM, no 100k-char truncation — one DB query (~ms latency).
Designed to compose with mode=fast: the calling agent does cheap FTS5
discovery, picks a promising hit, then calls back with mode='guided',
session_id from the result, and around_message_id=match_message_id from
the same result. The agent gets the actual conversation around the
anchor — the back-and-forth that fast's snippet teases but doesn't
deliver, and that summary distils into prose at 30s+ wall-clock cost.
Mechanics:
- New _guided_drill_down() helper handles the guided dispatch path
- Mode aliases ('drill', 'drilldown', 'drill-down', 'anchor', 'around')
normalise to 'guided'
- Validates required args (session_id + around_message_id), session
existence, and anchor-in-session, returning specific tool_error
messages for each failure mode
- Window clamped silently to [1, 20] (matches existing limit-clamp pattern)
- Rejects drill-down into the calling session's lineage — those messages
are already in the agent's active context (same convention as fast/
summary's _resolve_to_parent skip)
- Anchor row carries 'anchor': true so the agent can locate it in the
ordered window without re-checking ids
- Returns messages_before/messages_after counts so the agent sees boundary
effects ('this is the first 3, no more available before') without a
follow-up call
Schema:
- mode enum extended to ['fast', 'summary', 'guided']
- Three new optional parameters: session_id, around_message_id, window
- Description rewritten to teach the discover→drill flow with example
question shapes per mode
Dispatch:
- run_agent.py's two session_search dispatch sites updated to forward
the new optional kwargs
- Brittle source-grep test in test_session_search.py updated for the
new dispatch shape and now also pins the guided-mode kwargs
Tests:
- 11 new cases in TestGuidedMode covering happy path, missing-arg errors,
window clamps (low + high), session-not-found, anchor-not-in-session,
session-boundary partial windows, current-lineage rejection, mode
aliases, schema advertising, and metadata propagation
- 74/74 passing including the existing 53 + 9 hermes_state unit tests
End-to-end verified against a real DB snapshot: fast → read
match_message_id+session_id off the top hit → guided returns 7 messages
(3 before + anchor + 3 after) at ~40 KB payload, vs summary's ~220 KB
auxiliary-LLM input for the same query.
The original ceiling of 5 was sized for summary mode where each result costs a parallel auxiliary LLM call (~30s wall total). With the steering reframing of guided mode (see investigation page §6), fast becomes the 'discover and let the user pick' surface, and the user benefits from seeing more candidates before committing to a drill-down. Bumping the ceiling to 10 lets callers ask for a wider hit list when that's the goal. Default stays at 3 (one-shot recall is unchanged). Schema description updated to teach the LLM when to bump higher: 'when the user wants to be in the retrieval loop and pick the right anchor for a guided drill-down'. For summary mode this means up to 10 parallel aux calls instead of 5; the existing concurrency semaphore already bounds the actual wall time, and most users won't hit the higher cap unless they're using fast. 65/65 passing.
Extends mode='guided' to accept a list of anchors instead of a single
session+message pair. The agent calls fast with a wider limit, picks
the most promising K hits from the result list, and drills into all of
them in a single guided call — one window per anchor in the response.
This is the steering improvement flagged in the investigation page §6:
'5 results, pick top 3, strip tools' (strip-tools is a separate later
follow-up). Letting the agent inspect multiple windows in one turn
reduces the back-and-forth between fast and guided when the user
genuinely wants to look at several candidate sessions before committing.
Two input shapes (use one):
* Single anchor (back-compat): session_id + around_message_id
* Multi-anchor: anchors=[{session_id, around_message_id}, ...]
Single-anchor calls (the back-compat path) continue to work unchanged
and the response mirrors legacy fields at the top level when there's
exactly one window. Multi-anchor responses carry only 'windows' as the
authoritative list. Per-anchor failures (missing session, anchor not
in session, current-lineage rejection) become inline error entries
inside 'windows' rather than aborting the whole call — the agent can
still use successful drills if one anchor was malformed.
Window is shared across all anchors and clamped once to [1, 20].
Schema description updated to teach when to bump fast's limit higher
(5–10 for steering use cases) and how to compose anchors=[...] from
those results.
Tests:
- 7 new cases in TestGuidedModeMultiAnchor covering: two anchors both
succeed, one-fails-one-succeeds doesn't abort, single anchor via
anchors list normalises to legacy shape, empty/non-list anchors
return tool_error, window clamp shared across anchors, per-anchor
current-lineage rejection
- Brittle source-grep test updated to also pin the new anchors=
forwarding in run_agent.py
- 81/81 passing including the existing 65 + 7 new + brittle update + 9
hermes_state unit tests
End-to-end verified against real DB snapshot: 5 fast hits → top 3 as
anchors → 3 windows of 7 messages each (~100 kB total).
Live-test surfaced a real bug: fast-mode results paired the resolved lineage-root session_id with the raw FTS5 row's message_id. The (sid, match_message_id) handle was self-inconsistent because the message lives in the child (delegation/compression) session, not the parent — so the agent's follow-up mode='guided' call hit 'around_message_id N not in session_id ROOT' and the drill failed. Repro: ask the TUI to fast-search a topic that appears in a compressed child session of the current lineage, then ask it to drill in. Today's session is exactly that shape — message 18425 lives in 20260512_102257_d5048c (child) but fast returned its parent 20260511_101921_a7dd34 paired with id=18425. Fix has two layers: 1) Fast-mode output now pairs session_id (raw FTS5 sid) with match_message_id consistently. The lineage root is exposed as a separate parent_session_id field (omitted when there's no delegation/compression above). Dedup grouping still happens by lineage root, so the user still sees one entry per conversation, but the per-entry handle is now a valid pair the agent can hand straight to mode='guided'. - #15909 source-from-parent invariant preserved: source/model/title still promote from the resolved parent for display. 2) Defensive rebind in mode='guided': if (a_sid, a_msg_id) doesn't resolve, look up the actual owning session for a_msg_id. If it's a descendant in the same lineage as a_sid, transparently rebind and refetch. Records the rebind in a warning field on the returned window (also flattened to top level for single-anchor responses). Cross-lineage rebinds are refused — that path stays an error. This keeps the tool forgiving for legacy callers, memory snippets, or any other source that still emits the old (parent_sid, child_id) shape. 3) Schema description tightened: explicit note that the agent must pass (session_id, match_message_id) verbatim from a single fast result — do NOT substitute parent_session_id (it's display-only). Tests: updated the existing #15909 regression to assert the new pair shape, plus four new tests: - test_fast_pair_session_id_with_match_message_id (positive) - test_fast_no_parent_session_id_field_when_session_is_already_root (tidy output for non-delegation case) - test_guided_rebinds_anchor_when_message_lives_in_descendant_session (safety net fires correctly within a lineage) - test_guided_does_not_rebind_across_lineages (refuses cross-lineage rebind — no silent drill into unrelated session) 85/85 session_search + get_messages_around tests passing. Live-DB smoke test against /tmp/state-smoke.db (snapshot of ~/.hermes/state.db) confirms the user's failing case now rebinds: success: True top-level warning: 'around_message_id 18425 lives in 20260512_102257_d5048c (child of 20260511_101921_a7dd34); rebound transparently' returned session_id: 20260512_102257_d5048c window before/after: 5 / 5
The default mode is normally 'summary' (LLM recap of matched sessions).
This commit lets a user override that via:
# ~/.hermes/config.yaml
tools:
session_search:
default_mode: fast
Useful for power users who want to live with fast-as-default for a few
days and see how it feels — without having to pass mode='fast' on every
call. The summary path is still one explicit kwarg away.
Resolution order at call time:
1. Explicit mode= argument from the LLM (always wins)
2. tools.session_search.default_mode in ~/.hermes/config.yaml
3. 'summary' (final fallback)
Implementation:
- New helper _resolve_user_default_mode() in tools/session_search_tool.py
reads the value via hermes_cli.config.load_config(). Wrapped in
functools.lru_cache so the YAML read happens at most once per process
(config changes need a CLI / TUI restart, which is the existing
convention).
- Validates: must be a string, must be 'fast' or 'summary'. Anything
else (including 'guided', which needs anchors and can't stand alone)
logs a warning and falls back to 'summary'. The user gets feedback
when they typo their config.
- session_search()'s mode normaliser checks for None/empty/non-string
first and resolves the user default before applying alias mapping.
Explicit modes still take precedence over config.
- Both dispatch sites in run_agent.py changed from
mode=function_args.get('mode', 'summary') → mode=function_args.get('mode').
Hardcoding 'summary' at dispatch would shadow the new config-default
layer. Added a guard assert in test_run_agent_special_session_search_paths_forward_mode
so a regression to the old shape fails loudly.
- Schema description gets one extra sentence acknowledging the
user-configurable default so the LLM's own description of the tool
reflects reality.
Tests (+8):
- test_unset_mode_falls_back_to_summary_when_config_missing
- test_user_can_configure_fast_as_default
- test_user_can_configure_summary_as_default_explicitly
- test_invalid_default_mode_warns_and_falls_back (typo test)
- test_guided_as_default_mode_is_rejected
- test_non_string_default_mode_falls_back (bogus YAML types)
- test_explicit_mode_argument_overrides_user_default
- test_unset_mode_with_config_default_fast_runs_fast_path (e2e)
93/93 session_search + get_messages_around tests passing.
This is thread 2 of the prompt-tuning / default-mode plan from the
spike: thread 1 was the schema-description iteration (still in progress
on the spike page); thread 2 lets users carry the experiment around in
their own config while we converge on whether to flip the global default
in the schema.
…ame as two starts + one follow-up
Live-test conversation surfaced that the 'three modes (fast, summary,
guided)' framing makes the modes sound like peers when they aren't.
Guided literally cannot be a default — _resolve_user_default_mode()
already rejects it and forces summary. The honest shape is two
starting moves (fast, summary) plus one follow-up move (guided) that
needs anchors from a prior call.
Two cleanups follow from that:
1) Schema description rewritten with the 'two starts + one follow-up'
framing. Old MODES 1/2/3 list replaced with a structured 'Starting
moves' / 'Follow-up move' block. Recommended flows section folded
in (the per-question heuristics are now under each move's bullet).
2) Single-anchor schema parameters (session_id, around_message_id)
REMOVED from the LLM-facing schema. After multi-anchor shipped,
one-element anchors=[{...}] handles the single-anchor case
identically. Keeping both shapes in the schema was confusing — the
LLM occasionally tried to pair them or asked which to use.
The Python session_search() function still accepts session_id /
around_message_id kwargs for direct callers and test fixtures
(back-compat); only the LLM-facing schema lost them. Parameter
surface dropped from 6 LLM-visible knobs to 4 (query, role_filter,
limit, mode + anchors, window).
The mode parameter's description also got tightened — short summary of
each mode, points to the top-level description for when-to-use
guidance. The old description was duplicating the top-level mode
explanation in a more verbose form.
Updated test_schema_advertises_guided_mode:
- Asserts match_message_id pairing guidance now lives on the
anchors parameter, not the top-level description.
- Explicitly asserts session_id / around_message_id are NOT in the
schema (regression-proof against re-adding them).
93/93 session_search + get_messages_around tests passing.
This is the param-surface cleanup discussed yesterday alongside the
default_mode config commit. Closes the schema-surface side of the
'fast vs guided is confusing' user feedback; the spike doc §6.7 / §7
get matching updates in a separate commit on the architecture branch.
…-guided refusal Two schema description tweaks driven by smoke-test findings (PLAN.md v1.8): 1. S09 (search-fidelity FAIL) — agent skipped session_search entirely when asked 'what's the status of the commons-messaging PR on yoniebans.github.io?' and went straight to gh pr list. Technically correct that no PR existed, but missed two prior sessions and today's planning doc that referenced the branch. Fix: lead the USE THIS PROACTIVELY list with an explicit instruction to call session_search BEFORE external tools (gh, GitHub API, web, file inspection) when the question references prior work. The session DB carries what was DISCUSSED and DECIDED; external tools only show current world state. Use session_search to find context, external tools to verify reality. 2. S08 (schema-teaching weak case) — agent was asked to drill cold with multi-anchor guided. Did NOT refuse. Improvised recent → fast → fast → guided in one turn. Functionally correct (self-fed anchors from its own preceding fast calls), but the schema's 'cannot be a starting move' framing was followed in spirit, not articulated. The agent should EITHER refuse and ask, OR explicitly call fast first as a prerequisite — not silently improvise. Fix: reword 'Cannot be a starting move on its own' to a directive 'REQUIRES anchors from a prior fast or summary call. If you have no prior fast hit, call fast FIRST and use its match_message_id values as anchors. Never invent anchors or guess session_ids.' Same change echoed in the per-parameter mode description for the second-read reinforcement. Other 12 scenarios were clean. Schema base is good; these are surgical fixes for the two cases where the framing didn't land hard enough. 93/93 session_search + get_messages_around tests still pass.
…ribution
Summary mode invokes an auxiliary LLM (same Opus-tier model in default
'auto' routing) once per session summarised, with up to ~28K input
tokens (MAX_SESSION_CHARS=100K chars) and up to 10K output tokens
(MAX_SUMMARY_TOKENS) per call. That cost was being silently discarded:
_summarize_session() consumed response.usage only for the content string
and threw the usage data away. Smoke-test cost reporting showed
summary-mode scenarios at a fraction of their real spend because of it.
This patch:
- Changes _summarize_session() to return (content, usage) where usage
is a normalised dict {model, input_tokens, output_tokens,
cache_read_tokens, cache_creation_tokens} or None when the provider
didn't surface usage.
- Adds _extract_aux_usage() that handles both OpenAI-style
(prompt_tokens/completion_tokens, prompt_tokens_details.cached_tokens)
and Anthropic-style (input_tokens/output_tokens,
cache_read_input_tokens, cache_creation_input_tokens) usage shapes.
- The summary-mode caller aggregates per-session usage into both an
entry-level 'aux_usage' field and a top-level 'aux_usage_total'
carrying a call_count. The aggregate is omitted from the payload
entirely when no usage data was captured (test mocks, providers that
don't report it) so consumers can distinguish 'no data' from
'all zero'.
Note: this surfaces aux cost in the tool RESPONSE, where downstream
metrics extraction can pick it up. It does NOT yet attribute the cost
back to the parent session row (sessions.input_tokens / output_tokens /
estimated_cost_usd) — that's a wider fix to async_call_llm and the
session DB, out of scope here. Aggregator scripts (smoke-test
extractor, dashboards) get the data they need from the tool payload
without that wider change.
The registry handler hardcoded mode=args.get("mode", "summary") and the
function signature defaulted to "summary", which together made the
tools.session_search.default_mode config knob structurally unreachable
from real tool calls — _resolve_user_default_mode() only fires when
mode is None/empty, but neither path ever delivered None.
Drop both "summary" fallbacks so an omitted mode flows through as None
and the config-resolution branch can run.
Adds two tests: a static guard on the registry handler source pattern
(mirroring the existing run_agent.py one) and an end-to-end regression
that dispatches through the registry with default_mode='fast' configured
and asserts result["mode"] == "fast".
The previous fix wired _resolve_user_default_mode() to look up tools.session_search.default_mode, but the config schema has no top-level 'tools' section. The closest analogue is auxiliary.<tool>, which already groups per-tool config by tool name (auxiliary.vision has download_timeout, auxiliary.session_search has max_concurrency — neither is strictly aux-LLM routing). This moves the lookup to auxiliary.session_search.default_mode so the knob lives next to max_concurrency and the existing session_search config block. Adds default_mode to the default config scaffold so it shows up in fresh installs. Updates docstring, tool description string, warning messages, and all 7 mock-config tests to the new path. 88/88 tests passing.
…t→guided The prior tool description routed 'catch me up on X' / 'what did we decide' questions to summary mode by default, which was the failure mode the fast/guided rework was meant to fix. Summary stays available and is honoured when users configure it explicitly; the description now teaches fast→guided as the default recall path and calls out summary as opt-in synthesis. Schema mode.default flipped summary → fast. Resolver/scaffold fallback unchanged (still 'summary') for backward compatibility. No logic changes, no test updates needed; 88/88 passing.
…l noise
Three coordinated changes to make guided mode actually answer 'catch me up
on X' questions without needing summary:
1. New SessionDB.get_anchored_view() helper: returns the anchored window
plus the first/last N user+assistant messages of the session as
'bookend_start' / 'bookend_end'. Bookends are skipped when the window
already overlaps the session head or tail, so the response stays tight.
Default bookend=3, keep_roles=('user','assistant'). Tool messages are
dropped from the window EXCEPT the anchor itself (which may legitimately
be a tool message — dropping it would break the contract).
2. session_search mode='guided' switched to get_anchored_view (both primary
path and the child-session rebind fallback). Response shape gains
bookend_start + bookend_end alongside the existing messages array;
single-anchor response mirrors them at the top level for back-compat.
3. session_search mode='fast' now defaults role_filter to 'user,assistant'
when the caller doesn't pass one. Tool messages are mostly noise for
FTS5 (large outputs, serialised tool calls). Callers can opt back in
via role_filter='user,assistant,tool' for debugging or 'tool' for tool
output only.
Schema description updated to document bookends + tool filtering, and the
role_filter param description spells out the new default.
Test coverage:
- tests/hermes_state/test_get_anchored_view.py (12 tests): window/bookend
contract, role filtering, anchor-as-tool preservation, session isolation
- tests/tools/test_session_search.py: existing _make_db fixtures bridged
get_anchored_view → get_messages_around so the old guided tests still
pass; new TestGuidedBookendsInResponse asserts response shape; new
TestFastModeRoleFilterDefault pins the role_filter default.
122/122 passing across tests/hermes_state/ + tests/tools/test_session_search.py.
Single-commit revert-friendly.
Bookends were eating slots with tool-call-only assistant turns (content=''
with tool_calls populated). On long sessions whose tail is dominated by
orchestration heartbeats — poll, terminal, pgrep, etc. — bookend_end was
returning 3 empty rows instead of the actual prose closer.
Fix: add 'length(content) > 0' to both bookend SQL queries. Tool-call-only
assistants are skipped at the DB level; the closing prose ('Gateway
replaced...', 'Committed and pushed', etc.) survives into bookend_end.
User messages are never affected — the column is always populated for
user-role rows (verified against the live DB: 22 NULL-content rows total,
zero of them user-role).
Test: tests/hermes_state/test_get_anchored_view.py adds
test_bookends_skip_empty_content_assistant_turns — seeds a session with
the heartbeat pattern that exposed the bug and asserts the actual
opener/closer survive into bookend_start/bookend_end.
106/106 passing.
…ineage awareness Three additions to the tool description so the LLM uses the machinery that already exists: 1. MULTI-SESSION CATCH-UP: explicit instruction that when a topic spans multiple sessions, drill the top 2-3 fast hits as a single multi-anchor guided call — not just the top one. The multi-anchor shape was already supported but agents were anchoring on the top hit only and missing work in adjacent sessions. 2. READING GUIDED RESPONSES: explicit callout that every guided window carries three slices (bookend_start, messages, bookend_end) and the resolution lives in bookend_end. Reduces the risk of the LLM glossing the new bookend fields. 3. LINEAGE AWARENESS: notes that a child session's first messages are a post-compaction handoff, not the original arc opener — spot via parent_session_id. Tells the LLM how to recover the real opener when it matters (rare, but free to teach). anchors param description updated to reinforce multi-anchor catch-up at the point-of-use. No behavioural change — schema description only. 106/106 tests passing.
When fast returns hits whose snippets all look like the same keywords echoing (because the searched topic IS the subject of those sessions — e.g. searching 'session_search' in sessions about session_search), the snippets are decorative, not signal. The temptation is to pivot to find/grep/raw SQL — same shape failure as reflexive summary, just with manual archaeology instead of LLM telephone. New schema section instructs: don't pivot, drill. bookend_end carries the session's prose resolution that the snippets routinely miss. Observed failure that motivated this: an assistant asked to find a recently-drafted PR body got fast results with the right session in the top 5, but the snippets were wall-to-wall '>>>session_search<<<' markers, so it pivoted to find/sqlite3 and burned ~10 minutes. The right session's bookend_end contained 'Draft written to <path>' — exactly the artefact being searched for. No behavioural change; schema-only. 106/106 passing.
Fast mode currently orders results by FTS5 BM25 rank only. That's correct
when the user's question is exploratory ('what do we know about X') —
relevance leads, time is neutral — but it actively hurts two other common
question shapes:
1. Recency-shaped: 'where did we leave X', 'latest status of Y'. Same-rank
matches from years ago and yesterday are tied; FTS5 picks arbitrarily.
A reactivated old session can outrank a fresh one with no signal.
2. Origin-shaped: 'how did X start', 'first time we discussed Y'. The
originating session is usually short and gets out-scored by later
sessions that revisit the topic with more context — the origin hides
under its own descendants.
Adding a temporal tie-breaker by default would silently bias every query
toward 'latest', breaking the origin-shaped case. So sort is opt-in and
bidirectional, matching the existing 'agent picks the mode that fits the
question shape' pattern.
What this adds:
- session_search() gains a sort parameter accepting 'newest', 'oldest',
or None (default = current FTS5 rank-only behaviour preserved).
- db.search_messages() honours sort across all three SQL paths: main
FTS5 (timestamp DESC/ASC primary, rank tiebreaker), trigram CJK
(same), LIKE fallback (timestamp direction flip; no rank to combine).
- Tool layer normalises sort case-insensitively, falls back to None on
garbage values rather than failing the search, and silently strips
sort outside fast mode (with a debug log). Summary's session
selection deliberately stays time-neutral — agents wanting temporal
narrative drive fast with sort, then drill anchors with guided.
- Schema description gains a TEMPORAL DIRECTION section with concrete
question-shape examples, and a sort property on the parameters
block enumerating the valid values.
Tests:
- 6 new tool-layer tests covering default behaviour, both directions,
case-insensitivity, garbage fallback, and silent-ignore in summary.
- 4 new SQL-layer tests against the real DB exercising 'newest' /
'oldest' / unset (BM25 rank preserved) / invalid (rank fallback).
- 95→102 passing on tools/test_session_search.py before this commit;
108 passing after.
…in schema Smoke-test v2 surfaced that S13 (auxiliary.session_search.default_mode: summary) went fast→guided 5/5 iterations instead of respecting the user's configured summary default. The agent passed mode='fast' explicitly on every first call, ignoring the config. Root cause: the 'respect the configured default' guidance lived at the very bottom of the schema description, after all the 'fast → guided is best' teaching. The general guidance was louder than the user-preference clause. Fix: hoist USER-CONFIGURED DEFAULT to the top of the description, framed as something the agent should check FIRST. Strengthen the language: honour the user's configured default on the first call unless the question shape categorically requires a different mode. Don't override the user just because the general guidance says fast→guided is best. Replace the redundant bottom paragraph with a brief pointer to the top. No code changes — schema description only. Tests still 99/99.
…rst call Previous patch (71558e7) hoisted USER-CONFIGURED DEFAULT to the top of the schema with 'honour unless question shape categorically requires'. Re-running S13 with default_mode: summary still went fast→guided 5/5 — the agent rationalised that synthesis questions categorically require fast→guided. The schema teaching needs the escape clause removed. The user paying for the call has the better context on which trade they want; the agent shouldn't override based on its read of the question shape. After the first call, the agent can chain freely (e.g. guided drill into fast results), but the first mode comes from the configured default. Still no resolver-level hard lock. If schema teaching at this strength still fails to make the agent respect the user's preference, that's a separate follow-up — but at minimum the user's preference is now loud in the prompt. 99/99 tests still passing.
Teach the agent to use session_search effectively. Covers the three modes (fast/guided/summary), levers for tuning each call, composition patterns including multi-anchor catch-up, worked examples for named- artefact lookup and multi-session arc recall, and pitfalls.
The tool-description prose had accumulated playbook-style guidance over the course of development (pre-flight rules, mode-picking policy, multi-anchor recipe, anti-pattern teaching, reading-order advice). That material now lives in the session-recall skill where it can be loaded on demand rather than shipping in every system prompt. Schema description now covers only what the tool IS: what each mode returns, default-mode resolution, anchor contract, FTS5 syntax, and a one-paragraph 'when to use'. Mode enum description shrunk to three one-line entries. Cost claims generalised — no fixed dollar figures since aux-LLM cost depends on the user's configured aux model. Net: ~9.5 KB -> ~3 KB of description prose. One schema-content assertion in tests updated to match the new phrasing while keeping the same intent (cross-session language exists; no current-session nudge).
…odes # Conflicts: # scripts/release.py
…a + PR body) The schema description and the JSON-schema `mode.default` advertise `fast` as the default mode. The implementation was advertising one default and running another: DEFAULT_CONFIG shipped `default_mode: summary`, the resolver's six fallback paths all returned `summary`, and the invalid-mode coercion at the dispatch site hard-coded `summary` too. Net effect was the model being told 'default is fast' while the server ran summary — exactly the cost behaviour this work is meant to avoid. Changes: - hermes_cli/config.py: DEFAULT_CONFIG default_mode `summary` → `fast`. - tools/session_search_tool.py: every `return "summary"` fallback in _resolve_user_default_mode() now returns "fast" (six paths: ImportError, general Exception, raw is None, non-string raw, invalid value, and the function-level fallback). Warning log strings updated to match. - tools/session_search_tool.py: invalid-`mode=` arg at the dispatch site now falls back to _resolve_user_default_mode() instead of hard-coding "summary". Silent coercion of typos now still respects the user's configured default. - tests: 11 tests updated to match the new default (six in the resolver fallback class, three test methods renamed, plus the parametrised invalid-mode test and the positional-db backward-compat test). The new test names reflect what's being verified rather than the old default value.
…invalid-mode path
Three small follow-ups from the default-mode fix review:
1. Extract the literal 'fast' fallback into a module-level
_FALLBACK_DEFAULT_MODE constant. Six call sites in
_resolve_user_default_mode() now reference the constant, removing
the drift risk of changing the default in some paths but not
others.
2. New integration test: bogus mode= string at the dispatch site
with no config falls back to the resolver-resolved default ('fast').
Proves the dispatch site calls the resolver rather than hardcoding
a literal.
3. New integration test: bogus mode= string with default_mode=summary
in config lands on summary. Proves the dispatch-site coercion
honours the user's configured default for unknown modes too — not
just for unset modes.
The DEFAULT_CONFIG entry was added in this PR but the example config file wasn't kept in sync. Per CONTRIBUTING.md, config changes need to mirror into cli-config.yaml.example so users can see the knob and its documented values.
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
invalid-argument-type |
17 |
invalid-return-type |
2 |
unresolved-attribute |
2 |
unresolved-import |
2 |
invalid-parameter-default |
2 |
unsupported-operator |
2 |
invalid-assignment |
2 |
First entries
tests/tools/test_session_search.py:1941: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `str`, found `None`
tests/tools/test_session_search.py:754: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `str`, found `Literal["garbage", "", "RANDOM", 42] | None`
tests/tools/test_session_search.py:1747: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["anchors"]` on object of type `list[Unknown]`
tools/session_search_tool.py:1048: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `list[tuple[Unknown, ...] | Exception]`, found `list[Unknown | BaseException]`
tests/tools/test_session_search.py:1739: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["mode"]` on object of type `list[Unknown]`
run_agent.py:10694: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `int`, found `Unknown | None`
tools/session_search_tool.py:549: [invalid-argument-type] invalid-argument-type: Argument to constructor `int.__new__` is incorrect: Expected `str | Buffer | SupportsInt | SupportsIndex | SupportsTrunc`, found `Any | None | str`
tests/tools/test_session_search.py:911: [unresolved-attribute] unresolved-attribute: Function `_resolve_user_default_mode` has no attribute `cache_clear`
tests/tools/test_session_search.py:1957: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Literal["not_a_list"]`
tests/hermes_state/test_get_messages_around.py:11: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_session_search.py:1746: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["properties"]` on object of type `str`
tests/tools/test_session_search.py:1743: [unresolved-attribute] unresolved-attribute: Attribute `lower` is not defined on `dict[str, str | dict[str, dict[str, str] | dict[str, str | int] | dict[str, str | list[str]] | dict[str, str | dict[str, str | dict[str, dict[str, str]] | list[str]]]] | list[Unknown]]` in union `str | dict[str, str | dict[str, dict[str, str] | dict[str, str | int] | dict[str, str | list[str]] | dict[str, str | dict[str, str | dict[str, dict[str, str]] | list[str]]]] | list[Unknown]]`
run_agent.py:11333: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Any | None`
tools/session_search_tool.py:784: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `list[Unknown]`
tests/tools/test_session_search.py:1747: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["anchors"]` on object of type `str`
tools/session_search_tool.py:894: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.search_messages` is incorrect: Expected `str`, found `None | str`
tests/tools/test_session_search.py:1740: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["guided"]` and `Unknown | str | int | list[str] | dict[str, str | dict[str, dict[str, str]] | list[str]]`
tests/hermes_state/test_get_anchored_view.py:14: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_session_search.py:1739: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["mode"]` on object of type `str`
tools/session_search_tool.py:253: [invalid-return-type] invalid-return-type: Function can implicitly return `None`, which is not assignable to return type `tuple[str | None, dict[str, Any] | None]`
run_agent.py:13774: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
tests/tools/test_session_search.py:1747: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["match_message_id"]` and `Unknown | str | int | list[str] | dict[str, str | dict[str, dict[str, str]] | list[str]]`
tools/session_search_tool.py:1137: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["aux_usage_total"]` and value of type `dict[str, None | int]` on object of type `dict[str, int | str | list[Unknown]]`
run_agent.py:11331: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `int`, found `Any | None`
run_agent.py:10696: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Unknown | None`
... and 4 more
✅ Fixed issues (5):
| Rule | Count |
|---|---|
invalid-argument-type |
4 |
invalid-return-type |
1 |
First entries
run_agent.py:7482: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:13767: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:13764: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
tools/session_search_tool.py:375: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.search_messages` is incorrect: Expected `list[str]`, found `None | list[str]`
tools/session_search_tool.py:474: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `list[str | Exception]`, found `list[str | None | BaseException]`
Unchanged: 4319 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
Pass over comments added during the iterative development of this PR, trimming where they restated the code, repeated themselves, or read as journal-style narration. Net -22 comment lines; behaviour unchanged, 123 tests still passing. Notable trims: - DEFAULT_CONFIG module header: 9 lines → 4. Dropped the 'auxiliary started as aux-LLM routing but in practice groups per-tool config' digression — irrelevant to readers of this module. - get_anchored_view bookend-SQL filter block: 8 lines → 5. The 'let me check…-shaped assistant messages' over-narration is gone; the SQL filter rationale survives. - Fast-mode lineage-grouping IMPORTANT block: 12 lines → 8. The '#regression introduced by the original match_message_id rollout' meta-note removed (the comment now states the contract directly). - Fast-mode result-emission comment: 8 lines → 3. The 'lineage_root is the dict key…' explanation was restating the variables; the load-bearing one-liner (emit raw_sid + match_message_id) stays. - sort normalisation comment: 4 lines → 3. - role_filter parse comment: 5 lines → 3. - ORDER BY comment in search_messages: 3 lines → 2. - LIKE fallback ordering comment: 4 lines → 2.
2 tasks
teknium1
added a commit
that referenced
this pull request
May 18, 2026
…e — no LLM (#27590) * feat(session_search): single-shape tool with discovery, scroll, browse — no LLM Replaces the LLM-summarized session_search with a single-shape tool that returns actual messages from the DB. Three calling shapes inferred from args (no mode parameter): 1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit, all in one call. ~20ms on a real DB instead of ~90s for the previous three aux-LLM calls. 2. Scroll — pass session_id + around_message_id. Returns a window centered on the anchor. To paginate, re-anchor on the first/last id of the returned window. Boundary message appears in both windows as the orientation marker. ~1ms per scroll call. 3. Browse — no args. Recent sessions chronologically. Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give the agent goal + resolution on every discovery hit, so a single tool call reconstructs a long session's arc without loading the whole transcript. The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and laundered FTS5 hits through a model that could confabulate when the right session wasn't in the hit list. The merged shape returns byte-for-byte content from SQLite. History: - PR #20238 (JabberELF) seeded the fast/summary dual-mode split. - PR #26419 (yoniebans) expanded to fast/guided/summary with bookends, multi-anchor drill-down, default-mode config, and a teaching skill. This PR collapses that toolkit into one shape with explicit scroll support, drops the summary path, drops the mode parameter, drops the config knob, drops the skill. JabberELF's seed work is acknowledged via the AUTHOR_MAP entry. Validation: - 38/38 tool tests pass (tests/tools/test_session_search.py) - 12/12 get_messages_around tests pass (tests/hermes_state/) - 11/11 get_anchored_view tests pass (tests/hermes_state/) - Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main (test ordering in test_delegate.py, unrelated) - E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms; pagination forward+backward works with boundary-message orientation; error paths return clean tool_error responses Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com> * chore(session_search): prune dead LLM-summary config and docs Companion to the single-shape rewrite. The auxiliary.session_search config block, max_concurrency / extra_body tunables, and matching docs sections all referenced the removed LLM summarization path. Removing them so users don't try to tune knobs that nothing reads. - hermes_cli/config.py: drop dead auxiliary.session_search block from DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and ignored. - hermes_cli/tips.py: drop two tips referencing the removed max_concurrency / extra_body knobs. - website/docs/user-guide/configuration.md: drop 'Session Search Tuning' section and the auxiliary.session_search block from the example. - website/docs/user-guide/features/fallback-providers.md: drop session_search rows from the auxiliary-tasks tables and the dedicated tuning subsection. - website/docs/reference/tools-reference.md: rewrite the session_search entry to describe the new three-shape behaviour. - CONTRIBUTING.md: update the file-tree description. - tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone class and test_session_search_tool_guarded — both guard against an unguarded .content.strip() call site in _summarize_session() that no longer exists. Validation: 97/97 targeted tests still pass (hermes_state + session_search + llm_content_none_guard). Config tests 55/55. --------- Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com>
Lillard01
pushed a commit
to Lillard01/hermes-agent
that referenced
this pull request
May 21, 2026
…e — no LLM (NousResearch#27590) * feat(session_search): single-shape tool with discovery, scroll, browse — no LLM Replaces the LLM-summarized session_search with a single-shape tool that returns actual messages from the DB. Three calling shapes inferred from args (no mode parameter): 1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit, all in one call. ~20ms on a real DB instead of ~90s for the previous three aux-LLM calls. 2. Scroll — pass session_id + around_message_id. Returns a window centered on the anchor. To paginate, re-anchor on the first/last id of the returned window. Boundary message appears in both windows as the orientation marker. ~1ms per scroll call. 3. Browse — no args. Recent sessions chronologically. Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give the agent goal + resolution on every discovery hit, so a single tool call reconstructs a long session's arc without loading the whole transcript. The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and laundered FTS5 hits through a model that could confabulate when the right session wasn't in the hit list. The merged shape returns byte-for-byte content from SQLite. History: - PR NousResearch#20238 (JabberELF) seeded the fast/summary dual-mode split. - PR NousResearch#26419 (yoniebans) expanded to fast/guided/summary with bookends, multi-anchor drill-down, default-mode config, and a teaching skill. This PR collapses that toolkit into one shape with explicit scroll support, drops the summary path, drops the mode parameter, drops the config knob, drops the skill. JabberELF's seed work is acknowledged via the AUTHOR_MAP entry. Validation: - 38/38 tool tests pass (tests/tools/test_session_search.py) - 12/12 get_messages_around tests pass (tests/hermes_state/) - 11/11 get_anchored_view tests pass (tests/hermes_state/) - Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main (test ordering in test_delegate.py, unrelated) - E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms; pagination forward+backward works with boundary-message orientation; error paths return clean tool_error responses Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com> * chore(session_search): prune dead LLM-summary config and docs Companion to the single-shape rewrite. The auxiliary.session_search config block, max_concurrency / extra_body tunables, and matching docs sections all referenced the removed LLM summarization path. Removing them so users don't try to tune knobs that nothing reads. - hermes_cli/config.py: drop dead auxiliary.session_search block from DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and ignored. - hermes_cli/tips.py: drop two tips referencing the removed max_concurrency / extra_body knobs. - website/docs/user-guide/configuration.md: drop 'Session Search Tuning' section and the auxiliary.session_search block from the example. - website/docs/user-guide/features/fallback-providers.md: drop session_search rows from the auxiliary-tasks tables and the dedicated tuning subsection. - website/docs/reference/tools-reference.md: rewrite the session_search entry to describe the new three-shape behaviour. - CONTRIBUTING.md: update the file-tree description. - tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone class and test_session_search_tool_guarded — both guard against an unguarded .content.strip() call site in _summarize_session() that no longer exists. Validation: 97/97 targeted tests still pass (hermes_state + session_search + llm_content_none_guard). Config tests 55/55. --------- Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com>
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…e — no LLM (NousResearch#27590) * feat(session_search): single-shape tool with discovery, scroll, browse — no LLM Replaces the LLM-summarized session_search with a single-shape tool that returns actual messages from the DB. Three calling shapes inferred from args (no mode parameter): 1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit, all in one call. ~20ms on a real DB instead of ~90s for the previous three aux-LLM calls. 2. Scroll — pass session_id + around_message_id. Returns a window centered on the anchor. To paginate, re-anchor on the first/last id of the returned window. Boundary message appears in both windows as the orientation marker. ~1ms per scroll call. 3. Browse — no args. Recent sessions chronologically. Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give the agent goal + resolution on every discovery hit, so a single tool call reconstructs a long session's arc without loading the whole transcript. The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and laundered FTS5 hits through a model that could confabulate when the right session wasn't in the hit list. The merged shape returns byte-for-byte content from SQLite. History: - PR NousResearch#20238 (JabberELF) seeded the fast/summary dual-mode split. - PR NousResearch#26419 (yoniebans) expanded to fast/guided/summary with bookends, multi-anchor drill-down, default-mode config, and a teaching skill. This PR collapses that toolkit into one shape with explicit scroll support, drops the summary path, drops the mode parameter, drops the config knob, drops the skill. JabberELF's seed work is acknowledged via the AUTHOR_MAP entry. Validation: - 38/38 tool tests pass (tests/tools/test_session_search.py) - 12/12 get_messages_around tests pass (tests/hermes_state/) - 11/11 get_anchored_view tests pass (tests/hermes_state/) - Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main (test ordering in test_delegate.py, unrelated) - E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms; pagination forward+backward works with boundary-message orientation; error paths return clean tool_error responses Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com> * chore(session_search): prune dead LLM-summary config and docs Companion to the single-shape rewrite. The auxiliary.session_search config block, max_concurrency / extra_body tunables, and matching docs sections all referenced the removed LLM summarization path. Removing them so users don't try to tune knobs that nothing reads. - hermes_cli/config.py: drop dead auxiliary.session_search block from DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and ignored. - hermes_cli/tips.py: drop two tips referencing the removed max_concurrency / extra_body knobs. - website/docs/user-guide/configuration.md: drop 'Session Search Tuning' section and the auxiliary.session_search block from the example. - website/docs/user-guide/features/fallback-providers.md: drop session_search rows from the auxiliary-tasks tables and the dedicated tuning subsection. - website/docs/reference/tools-reference.md: rewrite the session_search entry to describe the new three-shape behaviour. - CONTRIBUTING.md: update the file-tree description. - tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone class and test_session_search_tool_guarded — both guard against an unguarded .content.strip() call site in _summarize_session() that no longer exists. Validation: 97/97 targeted tests still pass (hermes_state + session_search + llm_content_none_guard). Config tests 55/55. --------- Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Turns
session_searchfrom a single-mode summary-only tool into a three-mode toolkit so the model can match the recall question to the right cost/fidelity trade. Plus the supporting bits needed for that to actually work in practice: a real-time anchored drill-down, session bookends that surface goal + resolution on long sessions, opinionated tool-noise filtering, multi-anchor catch-up across same-topic sessions, a config knob for the per-user default, and a shipped skill that teaches composition.Built on top of the JabberELF salvage (PR #20238 / commits
7d628eaa3+aa2d3e2ee) which seeded the fast/summary split. Their two commits are preserved at the head of this branch; everything else is additive.Spike notes: session-search-modes spike — results — background, design, and measurements.
Motivation
The pre-existing
session_searchhad one mode: load up to N matched sessions, truncate each to 100K chars, hand them to an aux LLM for synthesis. Three problems:The three modes
fast(default)session_id+match_message_idanchors. Optionalsort='newest' / 'oldest'for temporal bias.guided(follow-up)summary(opt-in)Default mode is
fast; user can override per-process viaauxiliary.session_search.default_mode: summaryin~/.hermes/config.yaml.Related Issue
N/A — JabberELF salvage (PR #20238) seeded the original fast/summary split; this PR is the follow-on toolkit. No tracking issue.
Type of Change
Changes Made
Core feature work
mode='fast'— FTS5 snippets without aux LLM (7d628eaa3). Returnssession_id,match_message_id, context window. Default since1a00d730e.mode='guided'— anchored drill-down (1e29fa886,41c13ba71). Single or multi-anchor (anchors=[{session_id, around_message_id}, ...]). Returns raw message window per anchor. Built on newSessionDB.get_messages_aroundprimitive (2b606d20e).mode='summary'— retained as the original synthesis path, now opt-in.match_message_idin fast output (e74a682b0) — so the LLM can pair(session_id, match_message_id)and feed it straight into guided.41c13ba71) — catch-up across same-topic sessions in one call.36c5b188b) — fast is cheap; let the model widen the net when picking anchors for a multi-anchor drill.sortparam on fast (4f7e64c84) —sort='newest'for recency-shaped questions,sort='oldest'for origin-shaped questions, omit for relevance-led exploratory recall. Silently ignored in summary / guided / recent modes.Reliability + correctness
(session_id, match_message_id)pairing fix (8a31985e4) — pre-fix, fast returned the lineage-root session_id paired with a message_id that lived in a descendant session, so the agent's follow-up guided call failed. Now fast emits the raw owning sid; guided rebinds transparently for callers that still pass the parent.f4c43f088) — tool-call-only assistant turns no longer eat bookend slots. The session's prose closer survives intobookend_endeven after long orchestration-heartbeat tails.54d817f88) — guided with no prior fast call is rejected with a schema-level hint to call fast first.Configurability
auxiliary.session_search.default_modein~/.hermes/config.yaml(02a54e01c→76f40e644) — per-user default for the resolved mode. Resolver wired so explicitmode=arg always wins; unset / unknown modes flow to the resolved default. Documented incli-config.yaml.example(2ecad4911).role_filterdefaults touser,assistantonfast(b54b24607) — tool messages are usually noise. Caller can passrole_filter='user,assistant,tool'for tool-output debugging orrole_filter='tool'for tool output only.Bookends + tool-noise filtering on guided (
b54b24607)role='tool').SessionDB.get_anchored_viewreturns{window, bookend_start, bookend_end}:bookend_start: first 3 user+assistant prose messages of the session (empty if window already overlaps the session head).bookend_end: last 3 user+assistant prose messages (empty if window covers the tail).Cost attribution
8709e1ebe) — summary mode now reports the aux LLM input/output/cache tokens used per call in the response.Schema description
The tool description shipped with
session_searchis a compact specification: what each mode returns, default-mode resolution, anchor contract, FTS5 syntax, and a one-paragraph "when to use". Long-form playbook prose (mode-picker policy, multi-anchor recipe, reading-order advice, anti-pattern teaching) lives in thesession-recallskill rather than in the schema, so it loads on demand rather than shipping in every system prompt.Shipped skill:
session-recall(af1ea1f4e)New shipped skill at
skills/memory/session-recall/SKILL.md(with a newskills/memory/category andDESCRIPTION.md). Teaches the agent to usesession_searcheffectively: pre-flight rules, mode picker table, levers table, composition patterns, three worked examples (named artefact lookup, multi-session arc catch-up, drilling a known session), guidance on reading guided responses, pitfalls, and a closing note acknowledging the prose-teaching ceiling.File paths touched
tools/session_search_tool.py— three modes, dispatch, resolver, schemahermes_state.py—get_messages_around,get_anchored_viewprimitiveshermes_cli/config.py—auxiliary.session_search.default_modeknobcli-config.yaml.example— documents the new knobskills/memory/DESCRIPTION.md— new categoryskills/memory/session-recall/SKILL.md— bundled skilltests/tools/test_session_search.py— three-mode contract, resolver, multi-anchor, lineage rebind, role-filter, sort, bookend shapetests/hermes_state/test_get_anchored_view.py— anchored view + bookend overlap, role filtering, empty-content skip, session isolationtests/hermes_state/test_get_messages_around.py— anchored window primitivescripts/release.py—AUTHOR_MAPentry for JabberELFBehaviour change summary
mainhas onlysummarymode today. This PR addsfastandguided, makesfastthe default for new calls, and changes a few aspects of howsummarybehaves.summaryonlyfast,guided,summarymode=, no config)summaryfastauxiliary.session_search.default_modein~/.hermes/config.yaml; explicitmode=always wins; unset / unknown flows to resolved defaultrole_filterdefault onfast/summaryuser,assistant(caller can override to include or restrict totool)summaryaux_usageper call +aux_usage_totalper batchguided)session_id(raw owning),match_message_id,parent_session_idwhen different from session_idHow to Test
Run the targeted suite:
source venv/bin/activate pytest tests/hermes_state/test_get_anchored_view.py \ tests/hermes_state/test_get_messages_around.py \ tests/tools/test_session_search.py -qExpected: 123 passing.
Try the default in a TUI session:
Verify the shipped skill loads:
memory: session-recall.skill_view('session-recall')and follow the fast → guided composition described in the worked examples.Opt into summary as default (legacy behaviour):
Tested on Linux.
Checklist
Code
fix(scope):,feat(scope):, etc.)Documentation & Housekeeping
cli-config.yaml.examplefor the newauxiliary.session_search.default_modeknobCONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/ASESSION_SEARCH_SCHEMAdescription as a tight spec, updatedmodeenum description)For New Skills
session-recallteaches every user how to usesession_searcheffectively; recall is core to long-running agent workname,description,metadata.hermes.category: memory; pre-flight, mode picker, levers, composition patterns, worked examples, pitfalls, closing note)session_search(already a Hermes core tool)