feat(session_search): toolkit of fast / guided / summary modes for low-cost, high-fidelity recall by yoniebans · Pull Request #26419 · NousResearch/hermes-agent

yoniebans · 2026-05-15T15:13:19Z

What does this PR do?

Turns session_search from a single-mode summary-only tool into a three-mode toolkit so the model can match the recall question to the right cost/fidelity trade. Plus the supporting bits needed for that to actually work in practice: a real-time anchored drill-down, session bookends that surface goal + resolution on long sessions, opinionated tool-noise filtering, multi-anchor catch-up across same-topic sessions, a config knob for the per-user default, and a shipped skill that teaches composition.

Built on top of the JabberELF salvage (PR #20238 / commits 7d628eaa3 + aa2d3e2ee) which seeded the fast/summary split. Their two commits are preserved at the head of this branch; everything else is additive.

Spike notes: session-search-modes spike — results — background, design, and measurements.

Motivation

The pre-existing session_search had one mode: load up to N matched sessions, truncate each to 100K chars, hand them to an aux LLM for synthesis. Three problems:

Cost. Every recall call — even "did we touch X?" — paid the configured aux model's per-call rate for ~30s of generation across hundreds of thousands of input tokens.
Retrieval gap. The 100K-char truncation routinely covered only a fraction of long work sessions. Everything outside was discarded. The summary still came back confident — three layers of LLM telephone laundering a partial-retrieval failure into authoritative prose.
Modes-as-hierarchy framing. The "fast / summary" split treated summary as the recall you usually want and fast as a discovery shortcut. In practice fast → guided is the right composition for state recall, and summary should be the opt-in synthesis tool. The schema needs to teach that.

The three modes

Mode	Latency	Cost	Use for
`fast` (default)	~10 ms	$0	Any recall question — discovery AND state reconstruction. FTS5 snippets + 1 message of context. Returns `session_id` + `match_message_id` anchors. Optional `sort='newest' / 'oldest'` for temporal bias.
`guided` (follow-up)	~ms	$0	Drill into one or more anchors from a prior fast call. Returns the window around each anchor + session bookends (first/last user+assistant prose). No LLM, no truncation.
`summary` (opt-in)	~30 s	aux-LLM call per hit	Cross-session prose synthesis when a fast → guided walk would be too many round-trips, or when the user explicitly asks for synthesis.

Default mode is fast; user can override per-process via auxiliary.session_search.default_mode: summary in ~/.hermes/config.yaml.

Related Issue

N/A — JabberELF salvage (PR #20238) seeded the original fast/summary split; this PR is the follow-on toolkit. No tracking issue.

Type of Change

✨ New feature (non-breaking change that adds functionality)
🎯 New skill (bundled or hub)

Changes Made

Core feature work

mode='fast' — FTS5 snippets without aux LLM (7d628eaa3). Returns session_id, match_message_id, context window. Default since 1a00d730e.
mode='guided' — anchored drill-down (1e29fa886, 41c13ba71). Single or multi-anchor (anchors=[{session_id, around_message_id}, ...]). Returns raw message window per anchor. Built on new SessionDB.get_messages_around primitive (2b606d20e).
mode='summary' — retained as the original synthesis path, now opt-in.
match_message_id in fast output (e74a682b0) — so the LLM can pair (session_id, match_message_id) and feed it straight into guided.
Multi-anchor guided (41c13ba71) — catch-up across same-topic sessions in one call.
Limit ceiling 5 → 10 (36c5b188b) — fast is cheap; let the model widen the net when picking anchors for a multi-anchor drill.
sort param on fast (4f7e64c84) — sort='newest' for recency-shaped questions, sort='oldest' for origin-shaped questions, omit for relevance-led exploratory recall. Silently ignored in summary / guided / recent modes.

Reliability + correctness

Fast (session_id, match_message_id) pairing fix (8a31985e4) — pre-fix, fast returned the lineage-root session_id paired with a message_id that lived in a descendant session, so the agent's follow-up guided call failed. Now fast emits the raw owning sid; guided rebinds transparently for callers that still pass the parent.
Empty-content filter on bookends (f4c43f088) — tool-call-only assistant turns no longer eat bookend slots. The session's prose closer survives into bookend_end even after long orchestration-heartbeat tails.
Cold-guided refusal (54d817f88) — guided with no prior fast call is rejected with a schema-level hint to call fast first.

Configurability

auxiliary.session_search.default_mode in ~/.hermes/config.yaml (02a54e01c → 76f40e644) — per-user default for the resolved mode. Resolver wired so explicit mode= arg always wins; unset / unknown modes flow to the resolved default. Documented in cli-config.yaml.example (2ecad4911).
role_filter defaults to user,assistant on fast (b54b24607) — tool messages are usually noise. Caller can pass role_filter='user,assistant,tool' for tool-output debugging or role_filter='tool' for tool output only.

Bookends + tool-noise filtering on guided (`b54b24607`)

Guided windows now strip tool-role messages but always preserve the anchor (even if anchor is role='tool').
New SessionDB.get_anchored_view returns {window, bookend_start, bookend_end}:
- bookend_start: first 3 user+assistant prose messages of the session (empty if window already overlaps the session head).
- bookend_end: last 3 user+assistant prose messages (empty if window covers the tail).
Both bookend slices skip empty-content rows, so tool-call-only "let me check..."-shaped assistant turns don't crowd out actual openings/closings.

Cost attribution

Summary aux-usage surfacing (8709e1ebe) — summary mode now reports the aux LLM input/output/cache tokens used per call in the response.

Schema description

The tool description shipped with session_search is a compact specification: what each mode returns, default-mode resolution, anchor contract, FTS5 syntax, and a one-paragraph "when to use". Long-form playbook prose (mode-picker policy, multi-anchor recipe, reading-order advice, anti-pattern teaching) lives in the session-recall skill rather than in the schema, so it loads on demand rather than shipping in every system prompt.

Shipped skill: `session-recall` (`af1ea1f4e`)

New shipped skill at skills/memory/session-recall/SKILL.md (with a new skills/memory/ category and DESCRIPTION.md). Teaches the agent to use session_search effectively: pre-flight rules, mode picker table, levers table, composition patterns, three worked examples (named artefact lookup, multi-session arc catch-up, drilling a known session), guidance on reading guided responses, pitfalls, and a closing note acknowledging the prose-teaching ceiling.

File paths touched

tools/session_search_tool.py — three modes, dispatch, resolver, schema
hermes_state.py — get_messages_around, get_anchored_view primitives
hermes_cli/config.py — auxiliary.session_search.default_mode knob
cli-config.yaml.example — documents the new knob
skills/memory/DESCRIPTION.md — new category
skills/memory/session-recall/SKILL.md — bundled skill
tests/tools/test_session_search.py — three-mode contract, resolver, multi-anchor, lineage rebind, role-filter, sort, bookend shape
tests/hermes_state/test_get_anchored_view.py — anchored view + bookend overlap, role filtering, empty-content skip, session isolation
tests/hermes_state/test_get_messages_around.py — anchored window primitive
scripts/release.py — AUTHOR_MAP entry for JabberELF

Behaviour change summary

main has only summary mode today. This PR adds fast and guided, makes fast the default for new calls, and changes a few aspects of how summary behaves.

Area	Before this PR	After this PR
Available modes	`summary` only	`fast`, `guided`, `summary`
Default mode (no explicit `mode=`, no config)	`summary`	`fast`
User-configurable default	none	`auxiliary.session_search.default_mode` in `~/.hermes/config.yaml`; explicit `mode=` always wins; unset / unknown flows to resolved default
Per-call limit cap	5	10
`role_filter` default on `fast` / `summary`	n/a (no fast); summary all roles	`user,assistant` (caller can override to include or restrict to `tool`)
Cost reporting on `summary`	none	`aux_usage` per call + `aux_usage_total` per batch
Anchored drill-down (`guided`)	not available	new — single or multi-anchor; returns window + bookend_start + bookend_end; tool-role messages filtered (anchor preserved)
Fast result fields	n/a	`session_id` (raw owning), `match_message_id`, `parent_session_id` when different from session_id

How to Test

Run the targeted suite:

source venv/bin/activate
pytest tests/hermes_state/test_get_anchored_view.py \
       tests/hermes_state/test_get_messages_around.py \
       tests/tools/test_session_search.py -q

Expected: 123 passing.

Try the default in a TUI session:

hermes gateway run --replace &
# New TUI session — ask a recall-shaped prompt and inspect the tool calls.
# Default behaviour should be a fast call (no LLM); follow-ups should
# drill via guided.

Verify the shipped skill loads:
- Start a new session; the "Available Skills" banner should include memory: session-recall.
- Ask a recall-shaped prompt — the agent should skill_view('session-recall') and follow the fast → guided composition described in the worked examples.

Opt into summary as default (legacy behaviour):

# Add to ~/.hermes/config.yaml:
# auxiliary:
#   session_search:
#     default_mode: summary
hermes gateway run --replace &
# New session; default recall now runs through summary.

Tested on Linux.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run the targeted test suite and all tests pass (123 in the focused surface)
I've added tests for my changes
I've tested on my platform: Linux

Documentation & Housekeeping

I've updated relevant documentation (skill SKILL.md + new category DESCRIPTION.md, schema description, docstrings)
I've updated cli-config.yaml.example for the new auxiliary.session_search.default_mode knob
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — N/A (no file-I/O, process, or terminal changes; SQLite + pure-Python only)
I've updated tool descriptions/schemas (rewrote SESSION_SEARCH_SCHEMA description as a tight spec, updated mode enum description)

For New Skills

This skill is broadly useful to most users — session-recall teaches every user how to use session_search effectively; recall is core to long-running agent work
SKILL.md follows the standard format (frontmatter with name, description, metadata.hermes.category: memory; pre-flight, mode picker, levers, composition patterns, worked examples, pitfalls, closing note)
No external dependencies — pure prose + tool-call examples against session_search (already a Hermes core tool)
I've tested the skill end-to-end: in a fresh TUI session, asked a multi-facet recall question, agent loaded the skill, executed the fast → multi-anchor guided composition per Example B, and self-narrated the design choices

Add mode parameter to session_search tool supporting two modes: - fast (default): returns FTS5 snippets + context immediately (~0.02s), no LLM call — ideal for quick recall lookups - summary: preserves original behavior with LLM-generated session summaries (~10-30s) — use when fast mode is insufficient Changes: - tools/session_search_tool.py: implement fast mode path that returns FTS hits with snippets/context without calling auxiliary model; add mode parameter to schema (enum: fast|summary); apply parent session source/metadata resolution in fast mode (same pattern as upstream fix 6b4ccb9 in summary mode) - run_agent.py: pass mode argument from function_args in two call sites (direct tool call + subagent path) - tests/tools/test_session_search.py: add test coverage for fast mode output format, summary mode preservation, backwards compatibility, and run_agent.py mode forwarding verification The tool schema description is updated to recommend fast-first usage.

abcdjmm970703@gmail.com → JabberELF for the session_search fast/summary dual-mode salvage.

…pt-in Reverses the default introduced by the salvaged dual-mode commit. Why: profiled four representative queries against a real 280-session state.db (workspace harness, not committed). Summary mode is 1,299x-6,293x slower than fast (median ~30s vs ~10ms; 99%+ in the auxiliary LLM call) and produces 2.9x-3.9x larger result blobs, but it answers a materially different question. The user's typical 'what did we work on for X?' is the summary question — fast surfaces only what FTS5 directly matched while summary surfaces cross-session synthesis (e.g. work sessions referenced inside the matched cron jobs). Backwards-compatible default; fast remains opt-in for cheap discovery via mode='fast'. Changes: - tools/session_search_tool.py: default parameter, defensive coercion fallbacks, and registry handler all default to 'summary'. Schema description rewritten with measured trade-offs and the 'use fast for discovery, summary for recall' framing. - run_agent.py: both direct call sites mirror the new default. - tests/tools/test_session_search.py: split the old default-test into test_default_search_returns_summary_mode_recap (asserts new default) and test_explicit_fast_mode_returns_snippets... (covers fast path without mocking the default away). Invalid-mode test now asserts fallback to summary. Source-grep test updated.

Adds SessionDB.get_messages_around(session_id, around_message_id, window) which returns up to 'window' messages before the anchor, the anchor itself, and up to 'window' after — all from the same session, ordered by id ascending. Used by the upcoming session_search mode='guided' (anchored drill-down) to surface a focused conversation window without summarisation cost or the 100k-char truncation gamble of mode='summary'. Boundaries are honoured (fewer messages at session start/end), the anchor is verified to exist in the named session before fetching (cheap guard against cross-session id confusion), and content/tool_calls decoding mirrors get_messages() so callers can swap between the two without surprises. Tested: 9 new cases in tests/hermes_state/test_get_messages_around.py (middle-of-session, first-message, last-message, anchor-not-in-session, no cross-session leakage, window > session, window=0, window negative, content decoding parity with get_messages). 62/62 passing including the existing 53 session_search tests.

Adds 'match_message_id' to each fast-mode result entry, carrying through the FTS5 message id (already populated in the underlying search_messages result; just unsurfaced until now). This is the composition handle for the upcoming mode='guided' drill-down: the calling agent reads a fast hit, picks a promising session, and passes session_id + match_message_id back as around_message_id for an anchored window. Lossless for non-guided callers (additive field, no schema changes). One new test (test_fast_mode_includes_match_message_id_for_guided_drilldown). 63/63 passing.

Adds a third mode to session_search: guided returns a window of messages around a specific message id in a specific session. No FTS5, no auxiliary LLM, no 100k-char truncation — one DB query (~ms latency). Designed to compose with mode=fast: the calling agent does cheap FTS5 discovery, picks a promising hit, then calls back with mode='guided', session_id from the result, and around_message_id=match_message_id from the same result. The agent gets the actual conversation around the anchor — the back-and-forth that fast's snippet teases but doesn't deliver, and that summary distils into prose at 30s+ wall-clock cost. Mechanics: - New _guided_drill_down() helper handles the guided dispatch path - Mode aliases ('drill', 'drilldown', 'drill-down', 'anchor', 'around') normalise to 'guided' - Validates required args (session_id + around_message_id), session existence, and anchor-in-session, returning specific tool_error messages for each failure mode - Window clamped silently to [1, 20] (matches existing limit-clamp pattern) - Rejects drill-down into the calling session's lineage — those messages are already in the agent's active context (same convention as fast/ summary's _resolve_to_parent skip) - Anchor row carries 'anchor': true so the agent can locate it in the ordered window without re-checking ids - Returns messages_before/messages_after counts so the agent sees boundary effects ('this is the first 3, no more available before') without a follow-up call Schema: - mode enum extended to ['fast', 'summary', 'guided'] - Three new optional parameters: session_id, around_message_id, window - Description rewritten to teach the discover→drill flow with example question shapes per mode Dispatch: - run_agent.py's two session_search dispatch sites updated to forward the new optional kwargs - Brittle source-grep test in test_session_search.py updated for the new dispatch shape and now also pins the guided-mode kwargs Tests: - 11 new cases in TestGuidedMode covering happy path, missing-arg errors, window clamps (low + high), session-not-found, anchor-not-in-session, session-boundary partial windows, current-lineage rejection, mode aliases, schema advertising, and metadata propagation - 74/74 passing including the existing 53 + 9 hermes_state unit tests End-to-end verified against a real DB snapshot: fast → read match_message_id+session_id off the top hit → guided returns 7 messages (3 before + anchor + 3 after) at ~40 KB payload, vs summary's ~220 KB auxiliary-LLM input for the same query.

The original ceiling of 5 was sized for summary mode where each result costs a parallel auxiliary LLM call (~30s wall total). With the steering reframing of guided mode (see investigation page §6), fast becomes the 'discover and let the user pick' surface, and the user benefits from seeing more candidates before committing to a drill-down. Bumping the ceiling to 10 lets callers ask for a wider hit list when that's the goal. Default stays at 3 (one-shot recall is unchanged). Schema description updated to teach the LLM when to bump higher: 'when the user wants to be in the retrieval loop and pick the right anchor for a guided drill-down'. For summary mode this means up to 10 parallel aux calls instead of 5; the existing concurrency semaphore already bounds the actual wall time, and most users won't hit the higher cap unless they're using fast. 65/65 passing.

Extends mode='guided' to accept a list of anchors instead of a single session+message pair. The agent calls fast with a wider limit, picks the most promising K hits from the result list, and drills into all of them in a single guided call — one window per anchor in the response. This is the steering improvement flagged in the investigation page §6: '5 results, pick top 3, strip tools' (strip-tools is a separate later follow-up). Letting the agent inspect multiple windows in one turn reduces the back-and-forth between fast and guided when the user genuinely wants to look at several candidate sessions before committing. Two input shapes (use one): * Single anchor (back-compat): session_id + around_message_id * Multi-anchor: anchors=[{session_id, around_message_id}, ...] Single-anchor calls (the back-compat path) continue to work unchanged and the response mirrors legacy fields at the top level when there's exactly one window. Multi-anchor responses carry only 'windows' as the authoritative list. Per-anchor failures (missing session, anchor not in session, current-lineage rejection) become inline error entries inside 'windows' rather than aborting the whole call — the agent can still use successful drills if one anchor was malformed. Window is shared across all anchors and clamped once to [1, 20]. Schema description updated to teach when to bump fast's limit higher (5–10 for steering use cases) and how to compose anchors=[...] from those results. Tests: - 7 new cases in TestGuidedModeMultiAnchor covering: two anchors both succeed, one-fails-one-succeeds doesn't abort, single anchor via anchors list normalises to legacy shape, empty/non-list anchors return tool_error, window clamp shared across anchors, per-anchor current-lineage rejection - Brittle source-grep test updated to also pin the new anchors= forwarding in run_agent.py - 81/81 passing including the existing 65 + 7 new + brittle update + 9 hermes_state unit tests End-to-end verified against real DB snapshot: 5 fast hits → top 3 as anchors → 3 windows of 7 messages each (~100 kB total).

Live-test surfaced a real bug: fast-mode results paired the resolved lineage-root session_id with the raw FTS5 row's message_id. The (sid, match_message_id) handle was self-inconsistent because the message lives in the child (delegation/compression) session, not the parent — so the agent's follow-up mode='guided' call hit 'around_message_id N not in session_id ROOT' and the drill failed. Repro: ask the TUI to fast-search a topic that appears in a compressed child session of the current lineage, then ask it to drill in. Today's session is exactly that shape — message 18425 lives in 20260512_102257_d5048c (child) but fast returned its parent 20260511_101921_a7dd34 paired with id=18425. Fix has two layers: 1) Fast-mode output now pairs session_id (raw FTS5 sid) with match_message_id consistently. The lineage root is exposed as a separate parent_session_id field (omitted when there's no delegation/compression above). Dedup grouping still happens by lineage root, so the user still sees one entry per conversation, but the per-entry handle is now a valid pair the agent can hand straight to mode='guided'. - #15909 source-from-parent invariant preserved: source/model/title still promote from the resolved parent for display. 2) Defensive rebind in mode='guided': if (a_sid, a_msg_id) doesn't resolve, look up the actual owning session for a_msg_id. If it's a descendant in the same lineage as a_sid, transparently rebind and refetch. Records the rebind in a warning field on the returned window (also flattened to top level for single-anchor responses). Cross-lineage rebinds are refused — that path stays an error. This keeps the tool forgiving for legacy callers, memory snippets, or any other source that still emits the old (parent_sid, child_id) shape. 3) Schema description tightened: explicit note that the agent must pass (session_id, match_message_id) verbatim from a single fast result — do NOT substitute parent_session_id (it's display-only). Tests: updated the existing #15909 regression to assert the new pair shape, plus four new tests: - test_fast_pair_session_id_with_match_message_id (positive) - test_fast_no_parent_session_id_field_when_session_is_already_root (tidy output for non-delegation case) - test_guided_rebinds_anchor_when_message_lives_in_descendant_session (safety net fires correctly within a lineage) - test_guided_does_not_rebind_across_lineages (refuses cross-lineage rebind — no silent drill into unrelated session) 85/85 session_search + get_messages_around tests passing. Live-DB smoke test against /tmp/state-smoke.db (snapshot of ~/.hermes/state.db) confirms the user's failing case now rebinds: success: True top-level warning: 'around_message_id 18425 lives in 20260512_102257_d5048c (child of 20260511_101921_a7dd34); rebound transparently' returned session_id: 20260512_102257_d5048c window before/after: 5 / 5

The default mode is normally 'summary' (LLM recap of matched sessions). This commit lets a user override that via: # ~/.hermes/config.yaml tools: session_search: default_mode: fast Useful for power users who want to live with fast-as-default for a few days and see how it feels — without having to pass mode='fast' on every call. The summary path is still one explicit kwarg away. Resolution order at call time: 1. Explicit mode= argument from the LLM (always wins) 2. tools.session_search.default_mode in ~/.hermes/config.yaml 3. 'summary' (final fallback) Implementation: - New helper _resolve_user_default_mode() in tools/session_search_tool.py reads the value via hermes_cli.config.load_config(). Wrapped in functools.lru_cache so the YAML read happens at most once per process (config changes need a CLI / TUI restart, which is the existing convention). - Validates: must be a string, must be 'fast' or 'summary'. Anything else (including 'guided', which needs anchors and can't stand alone) logs a warning and falls back to 'summary'. The user gets feedback when they typo their config. - session_search()'s mode normaliser checks for None/empty/non-string first and resolves the user default before applying alias mapping. Explicit modes still take precedence over config. - Both dispatch sites in run_agent.py changed from mode=function_args.get('mode', 'summary') → mode=function_args.get('mode'). Hardcoding 'summary' at dispatch would shadow the new config-default layer. Added a guard assert in test_run_agent_special_session_search_paths_forward_mode so a regression to the old shape fails loudly. - Schema description gets one extra sentence acknowledging the user-configurable default so the LLM's own description of the tool reflects reality. Tests (+8): - test_unset_mode_falls_back_to_summary_when_config_missing - test_user_can_configure_fast_as_default - test_user_can_configure_summary_as_default_explicitly - test_invalid_default_mode_warns_and_falls_back (typo test) - test_guided_as_default_mode_is_rejected - test_non_string_default_mode_falls_back (bogus YAML types) - test_explicit_mode_argument_overrides_user_default - test_unset_mode_with_config_default_fast_runs_fast_path (e2e) 93/93 session_search + get_messages_around tests passing. This is thread 2 of the prompt-tuning / default-mode plan from the spike: thread 1 was the schema-description iteration (still in progress on the spike page); thread 2 lets users carry the experiment around in their own config while we converge on whether to flip the global default in the schema.

…ame as two starts + one follow-up Live-test conversation surfaced that the 'three modes (fast, summary, guided)' framing makes the modes sound like peers when they aren't. Guided literally cannot be a default — _resolve_user_default_mode() already rejects it and forces summary. The honest shape is two starting moves (fast, summary) plus one follow-up move (guided) that needs anchors from a prior call. Two cleanups follow from that: 1) Schema description rewritten with the 'two starts + one follow-up' framing. Old MODES 1/2/3 list replaced with a structured 'Starting moves' / 'Follow-up move' block. Recommended flows section folded in (the per-question heuristics are now under each move's bullet). 2) Single-anchor schema parameters (session_id, around_message_id) REMOVED from the LLM-facing schema. After multi-anchor shipped, one-element anchors=[{...}] handles the single-anchor case identically. Keeping both shapes in the schema was confusing — the LLM occasionally tried to pair them or asked which to use. The Python session_search() function still accepts session_id / around_message_id kwargs for direct callers and test fixtures (back-compat); only the LLM-facing schema lost them. Parameter surface dropped from 6 LLM-visible knobs to 4 (query, role_filter, limit, mode + anchors, window). The mode parameter's description also got tightened — short summary of each mode, points to the top-level description for when-to-use guidance. The old description was duplicating the top-level mode explanation in a more verbose form. Updated test_schema_advertises_guided_mode: - Asserts match_message_id pairing guidance now lives on the anchors parameter, not the top-level description. - Explicitly asserts session_id / around_message_id are NOT in the schema (regression-proof against re-adding them). 93/93 session_search + get_messages_around tests passing. This is the param-surface cleanup discussed yesterday alongside the default_mode config commit. Closes the schema-surface side of the 'fast vs guided is confusing' user feedback; the spike doc §6.7 / §7 get matching updates in a separate commit on the architecture branch.

…-guided refusal Two schema description tweaks driven by smoke-test findings (PLAN.md v1.8): 1. S09 (search-fidelity FAIL) — agent skipped session_search entirely when asked 'what's the status of the commons-messaging PR on yoniebans.github.io?' and went straight to gh pr list. Technically correct that no PR existed, but missed two prior sessions and today's planning doc that referenced the branch. Fix: lead the USE THIS PROACTIVELY list with an explicit instruction to call session_search BEFORE external tools (gh, GitHub API, web, file inspection) when the question references prior work. The session DB carries what was DISCUSSED and DECIDED; external tools only show current world state. Use session_search to find context, external tools to verify reality. 2. S08 (schema-teaching weak case) — agent was asked to drill cold with multi-anchor guided. Did NOT refuse. Improvised recent → fast → fast → guided in one turn. Functionally correct (self-fed anchors from its own preceding fast calls), but the schema's 'cannot be a starting move' framing was followed in spirit, not articulated. The agent should EITHER refuse and ask, OR explicitly call fast first as a prerequisite — not silently improvise. Fix: reword 'Cannot be a starting move on its own' to a directive 'REQUIRES anchors from a prior fast or summary call. If you have no prior fast hit, call fast FIRST and use its match_message_id values as anchors. Never invent anchors or guess session_ids.' Same change echoed in the per-parameter mode description for the second-read reinforcement. Other 12 scenarios were clean. Schema base is good; these are surgical fixes for the two cases where the framing didn't land hard enough. 93/93 session_search + get_messages_around tests still pass.

…ribution Summary mode invokes an auxiliary LLM (same Opus-tier model in default 'auto' routing) once per session summarised, with up to ~28K input tokens (MAX_SESSION_CHARS=100K chars) and up to 10K output tokens (MAX_SUMMARY_TOKENS) per call. That cost was being silently discarded: _summarize_session() consumed response.usage only for the content string and threw the usage data away. Smoke-test cost reporting showed summary-mode scenarios at a fraction of their real spend because of it. This patch: - Changes _summarize_session() to return (content, usage) where usage is a normalised dict {model, input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens} or None when the provider didn't surface usage. - Adds _extract_aux_usage() that handles both OpenAI-style (prompt_tokens/completion_tokens, prompt_tokens_details.cached_tokens) and Anthropic-style (input_tokens/output_tokens, cache_read_input_tokens, cache_creation_input_tokens) usage shapes. - The summary-mode caller aggregates per-session usage into both an entry-level 'aux_usage' field and a top-level 'aux_usage_total' carrying a call_count. The aggregate is omitted from the payload entirely when no usage data was captured (test mocks, providers that don't report it) so consumers can distinguish 'no data' from 'all zero'. Note: this surfaces aux cost in the tool RESPONSE, where downstream metrics extraction can pick it up. It does NOT yet attribute the cost back to the parent session row (sessions.input_tokens / output_tokens / estimated_cost_usd) — that's a wider fix to async_call_llm and the session DB, out of scope here. Aggregator scripts (smoke-test extractor, dashboards) get the data they need from the tool payload without that wider change.

The registry handler hardcoded mode=args.get("mode", "summary") and the function signature defaulted to "summary", which together made the tools.session_search.default_mode config knob structurally unreachable from real tool calls — _resolve_user_default_mode() only fires when mode is None/empty, but neither path ever delivered None. Drop both "summary" fallbacks so an omitted mode flows through as None and the config-resolution branch can run. Adds two tests: a static guard on the registry handler source pattern (mirroring the existing run_agent.py one) and an end-to-end regression that dispatches through the registry with default_mode='fast' configured and asserts result["mode"] == "fast".

The previous fix wired _resolve_user_default_mode() to look up tools.session_search.default_mode, but the config schema has no top-level 'tools' section. The closest analogue is auxiliary.<tool>, which already groups per-tool config by tool name (auxiliary.vision has download_timeout, auxiliary.session_search has max_concurrency — neither is strictly aux-LLM routing). This moves the lookup to auxiliary.session_search.default_mode so the knob lives next to max_concurrency and the existing session_search config block. Adds default_mode to the default config scaffold so it shows up in fresh installs. Updates docstring, tool description string, warning messages, and all 7 mock-config tests to the new path. 88/88 tests passing.

…t→guided The prior tool description routed 'catch me up on X' / 'what did we decide' questions to summary mode by default, which was the failure mode the fast/guided rework was meant to fix. Summary stays available and is honoured when users configure it explicitly; the description now teaches fast→guided as the default recall path and calls out summary as opt-in synthesis. Schema mode.default flipped summary → fast. Resolver/scaffold fallback unchanged (still 'summary') for backward compatibility. No logic changes, no test updates needed; 88/88 passing.

…l noise Three coordinated changes to make guided mode actually answer 'catch me up on X' questions without needing summary: 1. New SessionDB.get_anchored_view() helper: returns the anchored window plus the first/last N user+assistant messages of the session as 'bookend_start' / 'bookend_end'. Bookends are skipped when the window already overlaps the session head or tail, so the response stays tight. Default bookend=3, keep_roles=('user','assistant'). Tool messages are dropped from the window EXCEPT the anchor itself (which may legitimately be a tool message — dropping it would break the contract). 2. session_search mode='guided' switched to get_anchored_view (both primary path and the child-session rebind fallback). Response shape gains bookend_start + bookend_end alongside the existing messages array; single-anchor response mirrors them at the top level for back-compat. 3. session_search mode='fast' now defaults role_filter to 'user,assistant' when the caller doesn't pass one. Tool messages are mostly noise for FTS5 (large outputs, serialised tool calls). Callers can opt back in via role_filter='user,assistant,tool' for debugging or 'tool' for tool output only. Schema description updated to document bookends + tool filtering, and the role_filter param description spells out the new default. Test coverage: - tests/hermes_state/test_get_anchored_view.py (12 tests): window/bookend contract, role filtering, anchor-as-tool preservation, session isolation - tests/tools/test_session_search.py: existing _make_db fixtures bridged get_anchored_view → get_messages_around so the old guided tests still pass; new TestGuidedBookendsInResponse asserts response shape; new TestFastModeRoleFilterDefault pins the role_filter default. 122/122 passing across tests/hermes_state/ + tests/tools/test_session_search.py. Single-commit revert-friendly.

Bookends were eating slots with tool-call-only assistant turns (content='' with tool_calls populated). On long sessions whose tail is dominated by orchestration heartbeats — poll, terminal, pgrep, etc. — bookend_end was returning 3 empty rows instead of the actual prose closer. Fix: add 'length(content) > 0' to both bookend SQL queries. Tool-call-only assistants are skipped at the DB level; the closing prose ('Gateway replaced...', 'Committed and pushed', etc.) survives into bookend_end. User messages are never affected — the column is always populated for user-role rows (verified against the live DB: 22 NULL-content rows total, zero of them user-role). Test: tests/hermes_state/test_get_anchored_view.py adds test_bookends_skip_empty_content_assistant_turns — seeds a session with the heartbeat pattern that exposed the bug and asserts the actual opener/closer survive into bookend_start/bookend_end. 106/106 passing.

…ineage awareness Three additions to the tool description so the LLM uses the machinery that already exists: 1. MULTI-SESSION CATCH-UP: explicit instruction that when a topic spans multiple sessions, drill the top 2-3 fast hits as a single multi-anchor guided call — not just the top one. The multi-anchor shape was already supported but agents were anchoring on the top hit only and missing work in adjacent sessions. 2. READING GUIDED RESPONSES: explicit callout that every guided window carries three slices (bookend_start, messages, bookend_end) and the resolution lives in bookend_end. Reduces the risk of the LLM glossing the new bookend fields. 3. LINEAGE AWARENESS: notes that a child session's first messages are a post-compaction handoff, not the original arc opener — spot via parent_session_id. Tells the LLM how to recover the real opener when it matters (rare, but free to teach). anchors param description updated to reinforce multi-anchor catch-up at the point-of-use. No behavioural change — schema description only. 106/106 tests passing.

When fast returns hits whose snippets all look like the same keywords echoing (because the searched topic IS the subject of those sessions — e.g. searching 'session_search' in sessions about session_search), the snippets are decorative, not signal. The temptation is to pivot to find/grep/raw SQL — same shape failure as reflexive summary, just with manual archaeology instead of LLM telephone. New schema section instructs: don't pivot, drill. bookend_end carries the session's prose resolution that the snippets routinely miss. Observed failure that motivated this: an assistant asked to find a recently-drafted PR body got fast results with the right session in the top 5, but the snippets were wall-to-wall '>>>session_search<<<' markers, so it pivoted to find/sqlite3 and burned ~10 minutes. The right session's bookend_end contained 'Draft written to <path>' — exactly the artefact being searched for. No behavioural change; schema-only. 106/106 passing.

Fast mode currently orders results by FTS5 BM25 rank only. That's correct when the user's question is exploratory ('what do we know about X') — relevance leads, time is neutral — but it actively hurts two other common question shapes: 1. Recency-shaped: 'where did we leave X', 'latest status of Y'. Same-rank matches from years ago and yesterday are tied; FTS5 picks arbitrarily. A reactivated old session can outrank a fresh one with no signal. 2. Origin-shaped: 'how did X start', 'first time we discussed Y'. The originating session is usually short and gets out-scored by later sessions that revisit the topic with more context — the origin hides under its own descendants. Adding a temporal tie-breaker by default would silently bias every query toward 'latest', breaking the origin-shaped case. So sort is opt-in and bidirectional, matching the existing 'agent picks the mode that fits the question shape' pattern. What this adds: - session_search() gains a sort parameter accepting 'newest', 'oldest', or None (default = current FTS5 rank-only behaviour preserved). - db.search_messages() honours sort across all three SQL paths: main FTS5 (timestamp DESC/ASC primary, rank tiebreaker), trigram CJK (same), LIKE fallback (timestamp direction flip; no rank to combine). - Tool layer normalises sort case-insensitively, falls back to None on garbage values rather than failing the search, and silently strips sort outside fast mode (with a debug log). Summary's session selection deliberately stays time-neutral — agents wanting temporal narrative drive fast with sort, then drill anchors with guided. - Schema description gains a TEMPORAL DIRECTION section with concrete question-shape examples, and a sort property on the parameters block enumerating the valid values. Tests: - 6 new tool-layer tests covering default behaviour, both directions, case-insensitivity, garbage fallback, and silent-ignore in summary. - 4 new SQL-layer tests against the real DB exercising 'newest' / 'oldest' / unset (BM25 rank preserved) / invalid (rank fallback). - 95→102 passing on tools/test_session_search.py before this commit; 108 passing after.

…in schema Smoke-test v2 surfaced that S13 (auxiliary.session_search.default_mode: summary) went fast→guided 5/5 iterations instead of respecting the user's configured summary default. The agent passed mode='fast' explicitly on every first call, ignoring the config. Root cause: the 'respect the configured default' guidance lived at the very bottom of the schema description, after all the 'fast → guided is best' teaching. The general guidance was louder than the user-preference clause. Fix: hoist USER-CONFIGURED DEFAULT to the top of the description, framed as something the agent should check FIRST. Strengthen the language: honour the user's configured default on the first call unless the question shape categorically requires a different mode. Don't override the user just because the general guidance says fast→guided is best. Replace the redundant bottom paragraph with a brief pointer to the top. No code changes — schema description only. Tests still 99/99.

…rst call Previous patch (71558e7) hoisted USER-CONFIGURED DEFAULT to the top of the schema with 'honour unless question shape categorically requires'. Re-running S13 with default_mode: summary still went fast→guided 5/5 — the agent rationalised that synthesis questions categorically require fast→guided. The schema teaching needs the escape clause removed. The user paying for the call has the better context on which trade they want; the agent shouldn't override based on its read of the question shape. After the first call, the agent can chain freely (e.g. guided drill into fast results), but the first mode comes from the configured default. Still no resolver-level hard lock. If schema teaching at this strength still fails to make the agent respect the user's preference, that's a separate follow-up — but at minimum the user's preference is now loud in the prompt. 99/99 tests still passing.

Teach the agent to use session_search effectively. Covers the three modes (fast/guided/summary), levers for tuning each call, composition patterns including multi-anchor catch-up, worked examples for named- artefact lookup and multi-session arc recall, and pitfalls.

The tool-description prose had accumulated playbook-style guidance over the course of development (pre-flight rules, mode-picking policy, multi-anchor recipe, anti-pattern teaching, reading-order advice). That material now lives in the session-recall skill where it can be loaded on demand rather than shipping in every system prompt. Schema description now covers only what the tool IS: what each mode returns, default-mode resolution, anchor contract, FTS5 syntax, and a one-paragraph 'when to use'. Mode enum description shrunk to three one-line entries. Cost claims generalised — no fixed dollar figures since aux-LLM cost depends on the user's configured aux model. Net: ~9.5 KB -> ~3 KB of description prose. One schema-content assertion in tests updated to match the new phrasing while keeping the same intent (cross-session language exists; no current-session nudge).

…odes # Conflicts: # scripts/release.py

…a + PR body) The schema description and the JSON-schema `mode.default` advertise `fast` as the default mode. The implementation was advertising one default and running another: DEFAULT_CONFIG shipped `default_mode: summary`, the resolver's six fallback paths all returned `summary`, and the invalid-mode coercion at the dispatch site hard-coded `summary` too. Net effect was the model being told 'default is fast' while the server ran summary — exactly the cost behaviour this work is meant to avoid. Changes: - hermes_cli/config.py: DEFAULT_CONFIG default_mode `summary` → `fast`. - tools/session_search_tool.py: every `return "summary"` fallback in _resolve_user_default_mode() now returns "fast" (six paths: ImportError, general Exception, raw is None, non-string raw, invalid value, and the function-level fallback). Warning log strings updated to match. - tools/session_search_tool.py: invalid-`mode=` arg at the dispatch site now falls back to _resolve_user_default_mode() instead of hard-coding "summary". Silent coercion of typos now still respects the user's configured default. - tests: 11 tests updated to match the new default (six in the resolver fallback class, three test methods renamed, plus the parametrised invalid-mode test and the positional-db backward-compat test). The new test names reflect what's being verified rather than the old default value.

…invalid-mode path Three small follow-ups from the default-mode fix review: 1. Extract the literal 'fast' fallback into a module-level _FALLBACK_DEFAULT_MODE constant. Six call sites in _resolve_user_default_mode() now reference the constant, removing the drift risk of changing the default in some paths but not others. 2. New integration test: bogus mode= string at the dispatch site with no config falls back to the resolver-resolved default ('fast'). Proves the dispatch site calls the resolver rather than hardcoding a literal. 3. New integration test: bogus mode= string with default_mode=summary in config lands on summary. Proves the dispatch-site coercion honours the user's configured default for unknown modes too — not just for unset modes.

The DEFAULT_CONFIG entry was added in this PR but the example config file wasn't kept in sync. Per CONTRIBUTING.md, config changes need to mirror into cli-config.yaml.example so users can see the knob and its documented values.

github-actions · 2026-05-15T15:14:03Z

🔎 Lint report: `feat/session_search_modes` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8324 on HEAD, 8278 on base (🆕 +46)

🆕 New issues (29):

Rule	Count
`invalid-argument-type`	17
`invalid-return-type`	2
`unresolved-attribute`	2
`unresolved-import`	2
`invalid-parameter-default`	2
`unsupported-operator`	2
`invalid-assignment`	2

First entries

tests/tools/test_session_search.py:1941: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `str`, found `None`
tests/tools/test_session_search.py:754: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `str`, found `Literal["garbage", "", "RANDOM", 42] | None`
tests/tools/test_session_search.py:1747: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["anchors"]` on object of type `list[Unknown]`
tools/session_search_tool.py:1048: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `list[tuple[Unknown, ...] | Exception]`, found `list[Unknown | BaseException]`
tests/tools/test_session_search.py:1739: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["mode"]` on object of type `list[Unknown]`
run_agent.py:10694: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `int`, found `Unknown | None`
tools/session_search_tool.py:549: [invalid-argument-type] invalid-argument-type: Argument to constructor `int.__new__` is incorrect: Expected `str | Buffer | SupportsInt | SupportsIndex | SupportsTrunc`, found `Any | None | str`
tests/tools/test_session_search.py:911: [unresolved-attribute] unresolved-attribute: Function `_resolve_user_default_mode` has no attribute `cache_clear`
tests/tools/test_session_search.py:1957: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Literal["not_a_list"]`
tests/hermes_state/test_get_messages_around.py:11: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_session_search.py:1746: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["properties"]` on object of type `str`
tests/tools/test_session_search.py:1743: [unresolved-attribute] unresolved-attribute: Attribute `lower` is not defined on `dict[str, str | dict[str, dict[str, str] | dict[str, str | int] | dict[str, str | list[str]] | dict[str, str | dict[str, str | dict[str, dict[str, str]] | list[str]]]] | list[Unknown]]` in union `str | dict[str, str | dict[str, dict[str, str] | dict[str, str | int] | dict[str, str | list[str]] | dict[str, str | dict[str, str | dict[str, dict[str, str]] | list[str]]]] | list[Unknown]]`
run_agent.py:11333: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Any | None`
tools/session_search_tool.py:784: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `list[Unknown]`
tests/tools/test_session_search.py:1747: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["anchors"]` on object of type `str`
tools/session_search_tool.py:894: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.search_messages` is incorrect: Expected `str`, found `None | str`
tests/tools/test_session_search.py:1740: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["guided"]` and `Unknown | str | int | list[str] | dict[str, str | dict[str, dict[str, str]] | list[str]]`
tests/hermes_state/test_get_anchored_view.py:14: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_session_search.py:1739: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["mode"]` on object of type `str`
tools/session_search_tool.py:253: [invalid-return-type] invalid-return-type: Function can implicitly return `None`, which is not assignable to return type `tuple[str | None, dict[str, Any] | None]`
run_agent.py:13774: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
tests/tools/test_session_search.py:1747: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["match_message_id"]` and `Unknown | str | int | list[str] | dict[str, str | dict[str, dict[str, str]] | list[str]]`
tools/session_search_tool.py:1137: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["aux_usage_total"]` and value of type `dict[str, None | int]` on object of type `dict[str, int | str | list[Unknown]]`
run_agent.py:11331: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `int`, found `Any | None`
run_agent.py:10696: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Unknown | None`
... and 4 more

✅ Fixed issues (5):

Rule	Count
`invalid-argument-type`	4
`invalid-return-type`	1

First entries

run_agent.py:7482: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:13767: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:13764: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
tools/session_search_tool.py:375: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.search_messages` is incorrect: Expected `list[str]`, found `None | list[str]`
tools/session_search_tool.py:474: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `list[str | Exception]`, found `list[str | None | BaseException]`

Unchanged: 4319 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Pass over comments added during the iterative development of this PR, trimming where they restated the code, repeated themselves, or read as journal-style narration. Net -22 comment lines; behaviour unchanged, 123 tests still passing. Notable trims: - DEFAULT_CONFIG module header: 9 lines → 4. Dropped the 'auxiliary started as aux-LLM routing but in practice groups per-tool config' digression — irrelevant to readers of this module. - get_anchored_view bookend-SQL filter block: 8 lines → 5. The 'let me check…-shaped assistant messages' over-narration is gone; the SQL filter rationale survives. - Fast-mode lineage-grouping IMPORTANT block: 12 lines → 8. The '#regression introduced by the original match_message_id rollout' meta-note removed (the comment now states the contract directly). - Fast-mode result-emission comment: 8 lines → 3. The 'lineage_root is the dict key…' explanation was restating the variables; the load-bearing one-liner (emit raw_sid + match_message_id) stays. - sort normalisation comment: 4 lines → 3. - role_filter parse comment: 5 lines → 3. - ORDER BY comment in search_messages: 3 lines → 2. - LIKE fallback ordering comment: 4 lines → 2.

…e — no LLM (#27590) * feat(session_search): single-shape tool with discovery, scroll, browse — no LLM Replaces the LLM-summarized session_search with a single-shape tool that returns actual messages from the DB. Three calling shapes inferred from args (no mode parameter): 1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit, all in one call. ~20ms on a real DB instead of ~90s for the previous three aux-LLM calls. 2. Scroll — pass session_id + around_message_id. Returns a window centered on the anchor. To paginate, re-anchor on the first/last id of the returned window. Boundary message appears in both windows as the orientation marker. ~1ms per scroll call. 3. Browse — no args. Recent sessions chronologically. Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give the agent goal + resolution on every discovery hit, so a single tool call reconstructs a long session's arc without loading the whole transcript. The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and laundered FTS5 hits through a model that could confabulate when the right session wasn't in the hit list. The merged shape returns byte-for-byte content from SQLite. History: - PR #20238 (JabberELF) seeded the fast/summary dual-mode split. - PR #26419 (yoniebans) expanded to fast/guided/summary with bookends, multi-anchor drill-down, default-mode config, and a teaching skill. This PR collapses that toolkit into one shape with explicit scroll support, drops the summary path, drops the mode parameter, drops the config knob, drops the skill. JabberELF's seed work is acknowledged via the AUTHOR_MAP entry. Validation: - 38/38 tool tests pass (tests/tools/test_session_search.py) - 12/12 get_messages_around tests pass (tests/hermes_state/) - 11/11 get_anchored_view tests pass (tests/hermes_state/) - Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main (test ordering in test_delegate.py, unrelated) - E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms; pagination forward+backward works with boundary-message orientation; error paths return clean tool_error responses Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com> * chore(session_search): prune dead LLM-summary config and docs Companion to the single-shape rewrite. The auxiliary.session_search config block, max_concurrency / extra_body tunables, and matching docs sections all referenced the removed LLM summarization path. Removing them so users don't try to tune knobs that nothing reads. - hermes_cli/config.py: drop dead auxiliary.session_search block from DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and ignored. - hermes_cli/tips.py: drop two tips referencing the removed max_concurrency / extra_body knobs. - website/docs/user-guide/configuration.md: drop 'Session Search Tuning' section and the auxiliary.session_search block from the example. - website/docs/user-guide/features/fallback-providers.md: drop session_search rows from the auxiliary-tasks tables and the dedicated tuning subsection. - website/docs/reference/tools-reference.md: rewrite the session_search entry to describe the new three-shape behaviour. - CONTRIBUTING.md: update the file-tree description. - tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone class and test_session_search_tool_guarded — both guard against an unguarded .content.strip() call site in _summarize_session() that no longer exists. Validation: 97/97 targeted tests still pass (hermes_state + session_search + llm_content_none_guard). Config tests 55/55. --------- Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com>

…e — no LLM (NousResearch#27590) * feat(session_search): single-shape tool with discovery, scroll, browse — no LLM Replaces the LLM-summarized session_search with a single-shape tool that returns actual messages from the DB. Three calling shapes inferred from args (no mode parameter): 1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit, all in one call. ~20ms on a real DB instead of ~90s for the previous three aux-LLM calls. 2. Scroll — pass session_id + around_message_id. Returns a window centered on the anchor. To paginate, re-anchor on the first/last id of the returned window. Boundary message appears in both windows as the orientation marker. ~1ms per scroll call. 3. Browse — no args. Recent sessions chronologically. Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give the agent goal + resolution on every discovery hit, so a single tool call reconstructs a long session's arc without loading the whole transcript. The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and laundered FTS5 hits through a model that could confabulate when the right session wasn't in the hit list. The merged shape returns byte-for-byte content from SQLite. History: - PR NousResearch#20238 (JabberELF) seeded the fast/summary dual-mode split. - PR NousResearch#26419 (yoniebans) expanded to fast/guided/summary with bookends, multi-anchor drill-down, default-mode config, and a teaching skill. This PR collapses that toolkit into one shape with explicit scroll support, drops the summary path, drops the mode parameter, drops the config knob, drops the skill. JabberELF's seed work is acknowledged via the AUTHOR_MAP entry. Validation: - 38/38 tool tests pass (tests/tools/test_session_search.py) - 12/12 get_messages_around tests pass (tests/hermes_state/) - 11/11 get_anchored_view tests pass (tests/hermes_state/) - Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main (test ordering in test_delegate.py, unrelated) - E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms; pagination forward+backward works with boundary-message orientation; error paths return clean tool_error responses Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com> * chore(session_search): prune dead LLM-summary config and docs Companion to the single-shape rewrite. The auxiliary.session_search config block, max_concurrency / extra_body tunables, and matching docs sections all referenced the removed LLM summarization path. Removing them so users don't try to tune knobs that nothing reads. - hermes_cli/config.py: drop dead auxiliary.session_search block from DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and ignored. - hermes_cli/tips.py: drop two tips referencing the removed max_concurrency / extra_body knobs. - website/docs/user-guide/configuration.md: drop 'Session Search Tuning' section and the auxiliary.session_search block from the example. - website/docs/user-guide/features/fallback-providers.md: drop session_search rows from the auxiliary-tasks tables and the dedicated tuning subsection. - website/docs/reference/tools-reference.md: rewrite the session_search entry to describe the new three-shape behaviour. - CONTRIBUTING.md: update the file-tree description. - tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone class and test_session_search_tool_guarded — both guard against an unguarded .content.strip() call site in _summarize_session() that no longer exists. Validation: 97/97 targeted tests still pass (hermes_state + session_search + llm_content_none_guard). Config tests 55/55. --------- Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com>

JabberELF and others added 29 commits May 11, 2026 17:52

chore: AUTHOR_MAP entry for JabberELF (PR #20238 salvage)

aa2d3e2

abcdjmm970703@gmail.com → JabberELF for the session_search fast/summary dual-mode salvage.

Merge remote-tracking branch 'origin/main' into feat/session_search_m…

b5996b6

…odes # Conflicts: # scripts/release.py

alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/memory Memory tool and memory providers labels May 15, 2026

yoniebans mentioned this pull request May 16, 2026

spike(session-search-modes): three-mode session_search investigation yoniebans/hermes-architecture#9

Merged

teknium1 mentioned this pull request May 17, 2026

feat(session_search): single-shape tool with discovery, scroll, browse — no LLM #27590

Merged

2 tasks

teknium1 closed this in #27590 May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(session_search): toolkit of fast / guided / summary modes for low-cost, high-fidelity recall#26419

feat(session_search): toolkit of fast / guided / summary modes for low-cost, high-fidelity recall#26419
yoniebans wants to merge 30 commits into
mainfrom
feat/session_search_modes

yoniebans commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yoniebans commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

The three modes

Related Issue

Type of Change

Changes Made

Core feature work

Reliability + correctness

Configurability

Bookends + tool-noise filtering on guided (b54b24607)

Cost attribution

Schema description

Shipped skill: session-recall (af1ea1f4e)

File paths touched

Behaviour change summary

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: feat/session_search_modes vs origin/main

ruff

ty (type checker)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yoniebans commented May 15, 2026 •

edited

Loading

Bookends + tool-noise filtering on guided (`b54b24607`)

Shipped skill: `session-recall` (`af1ea1f4e`)

github-actions Bot commented May 15, 2026 •

edited

Loading

🔎 Lint report: `feat/session_search_modes` vs `origin/main`