Skip to content

feat(session_search): toolkit of fast / guided / summary modes for low-cost, high-fidelity recall#26419

Closed
yoniebans wants to merge 30 commits into
mainfrom
feat/session_search_modes
Closed

feat(session_search): toolkit of fast / guided / summary modes for low-cost, high-fidelity recall#26419
yoniebans wants to merge 30 commits into
mainfrom
feat/session_search_modes

Conversation

@yoniebans

@yoniebans yoniebans commented May 15, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Turns session_search from a single-mode summary-only tool into a three-mode toolkit so the model can match the recall question to the right cost/fidelity trade. Plus the supporting bits needed for that to actually work in practice: a real-time anchored drill-down, session bookends that surface goal + resolution on long sessions, opinionated tool-noise filtering, multi-anchor catch-up across same-topic sessions, a config knob for the per-user default, and a shipped skill that teaches composition.

Built on top of the JabberELF salvage (PR #20238 / commits 7d628eaa3 + aa2d3e2ee) which seeded the fast/summary split. Their two commits are preserved at the head of this branch; everything else is additive.

Spike notes: session-search-modes spike — results — background, design, and measurements.

Motivation

The pre-existing session_search had one mode: load up to N matched sessions, truncate each to 100K chars, hand them to an aux LLM for synthesis. Three problems:

  1. Cost. Every recall call — even "did we touch X?" — paid the configured aux model's per-call rate for ~30s of generation across hundreds of thousands of input tokens.
  2. Retrieval gap. The 100K-char truncation routinely covered only a fraction of long work sessions. Everything outside was discarded. The summary still came back confident — three layers of LLM telephone laundering a partial-retrieval failure into authoritative prose.
  3. Modes-as-hierarchy framing. The "fast / summary" split treated summary as the recall you usually want and fast as a discovery shortcut. In practice fast → guided is the right composition for state recall, and summary should be the opt-in synthesis tool. The schema needs to teach that.

The three modes

Mode Latency Cost Use for
fast (default) ~10 ms $0 Any recall question — discovery AND state reconstruction. FTS5 snippets + 1 message of context. Returns session_id + match_message_id anchors. Optional sort='newest' / 'oldest' for temporal bias.
guided (follow-up) ~ms $0 Drill into one or more anchors from a prior fast call. Returns the window around each anchor + session bookends (first/last user+assistant prose). No LLM, no truncation.
summary (opt-in) ~30 s aux-LLM call per hit Cross-session prose synthesis when a fast → guided walk would be too many round-trips, or when the user explicitly asks for synthesis.

Default mode is fast; user can override per-process via auxiliary.session_search.default_mode: summary in ~/.hermes/config.yaml.

Related Issue

N/A — JabberELF salvage (PR #20238) seeded the original fast/summary split; this PR is the follow-on toolkit. No tracking issue.

Type of Change

  • ✨ New feature (non-breaking change that adds functionality)
  • 🎯 New skill (bundled or hub)

Changes Made

Core feature work

  • mode='fast' — FTS5 snippets without aux LLM (7d628eaa3). Returns session_id, match_message_id, context window. Default since 1a00d730e.
  • mode='guided' — anchored drill-down (1e29fa886, 41c13ba71). Single or multi-anchor (anchors=[{session_id, around_message_id}, ...]). Returns raw message window per anchor. Built on new SessionDB.get_messages_around primitive (2b606d20e).
  • mode='summary' — retained as the original synthesis path, now opt-in.
  • match_message_id in fast output (e74a682b0) — so the LLM can pair (session_id, match_message_id) and feed it straight into guided.
  • Multi-anchor guided (41c13ba71) — catch-up across same-topic sessions in one call.
  • Limit ceiling 5 → 10 (36c5b188b) — fast is cheap; let the model widen the net when picking anchors for a multi-anchor drill.
  • sort param on fast (4f7e64c84) — sort='newest' for recency-shaped questions, sort='oldest' for origin-shaped questions, omit for relevance-led exploratory recall. Silently ignored in summary / guided / recent modes.

Reliability + correctness

  • Fast (session_id, match_message_id) pairing fix (8a31985e4) — pre-fix, fast returned the lineage-root session_id paired with a message_id that lived in a descendant session, so the agent's follow-up guided call failed. Now fast emits the raw owning sid; guided rebinds transparently for callers that still pass the parent.
  • Empty-content filter on bookends (f4c43f088) — tool-call-only assistant turns no longer eat bookend slots. The session's prose closer survives into bookend_end even after long orchestration-heartbeat tails.
  • Cold-guided refusal (54d817f88) — guided with no prior fast call is rejected with a schema-level hint to call fast first.

Configurability

  • auxiliary.session_search.default_mode in ~/.hermes/config.yaml (02a54e01c76f40e644) — per-user default for the resolved mode. Resolver wired so explicit mode= arg always wins; unset / unknown modes flow to the resolved default. Documented in cli-config.yaml.example (2ecad4911).
  • role_filter defaults to user,assistant on fast (b54b24607) — tool messages are usually noise. Caller can pass role_filter='user,assistant,tool' for tool-output debugging or role_filter='tool' for tool output only.

Bookends + tool-noise filtering on guided (b54b24607)

  • Guided windows now strip tool-role messages but always preserve the anchor (even if anchor is role='tool').
  • New SessionDB.get_anchored_view returns {window, bookend_start, bookend_end}:
    • bookend_start: first 3 user+assistant prose messages of the session (empty if window already overlaps the session head).
    • bookend_end: last 3 user+assistant prose messages (empty if window covers the tail).
  • Both bookend slices skip empty-content rows, so tool-call-only "let me check..."-shaped assistant turns don't crowd out actual openings/closings.

Cost attribution

  • Summary aux-usage surfacing (8709e1ebe) — summary mode now reports the aux LLM input/output/cache tokens used per call in the response.

Schema description

The tool description shipped with session_search is a compact specification: what each mode returns, default-mode resolution, anchor contract, FTS5 syntax, and a one-paragraph "when to use". Long-form playbook prose (mode-picker policy, multi-anchor recipe, reading-order advice, anti-pattern teaching) lives in the session-recall skill rather than in the schema, so it loads on demand rather than shipping in every system prompt.

Shipped skill: session-recall (af1ea1f4e)

New shipped skill at skills/memory/session-recall/SKILL.md (with a new skills/memory/ category and DESCRIPTION.md). Teaches the agent to use session_search effectively: pre-flight rules, mode picker table, levers table, composition patterns, three worked examples (named artefact lookup, multi-session arc catch-up, drilling a known session), guidance on reading guided responses, pitfalls, and a closing note acknowledging the prose-teaching ceiling.

File paths touched

  • tools/session_search_tool.py — three modes, dispatch, resolver, schema
  • hermes_state.pyget_messages_around, get_anchored_view primitives
  • hermes_cli/config.pyauxiliary.session_search.default_mode knob
  • cli-config.yaml.example — documents the new knob
  • skills/memory/DESCRIPTION.md — new category
  • skills/memory/session-recall/SKILL.md — bundled skill
  • tests/tools/test_session_search.py — three-mode contract, resolver, multi-anchor, lineage rebind, role-filter, sort, bookend shape
  • tests/hermes_state/test_get_anchored_view.py — anchored view + bookend overlap, role filtering, empty-content skip, session isolation
  • tests/hermes_state/test_get_messages_around.py — anchored window primitive
  • scripts/release.pyAUTHOR_MAP entry for JabberELF

Behaviour change summary

main has only summary mode today. This PR adds fast and guided, makes fast the default for new calls, and changes a few aspects of how summary behaves.

Area Before this PR After this PR
Available modes summary only fast, guided, summary
Default mode (no explicit mode=, no config) summary fast
User-configurable default none auxiliary.session_search.default_mode in ~/.hermes/config.yaml; explicit mode= always wins; unset / unknown flows to resolved default
Per-call limit cap 5 10
role_filter default on fast / summary n/a (no fast); summary all roles user,assistant (caller can override to include or restrict to tool)
Cost reporting on summary none aux_usage per call + aux_usage_total per batch
Anchored drill-down (guided) not available new — single or multi-anchor; returns window + bookend_start + bookend_end; tool-role messages filtered (anchor preserved)
Fast result fields n/a session_id (raw owning), match_message_id, parent_session_id when different from session_id

How to Test

  1. Run the targeted suite:

    source venv/bin/activate
    pytest tests/hermes_state/test_get_anchored_view.py \
           tests/hermes_state/test_get_messages_around.py \
           tests/tools/test_session_search.py -q

    Expected: 123 passing.

  2. Try the default in a TUI session:

    hermes gateway run --replace &
    # New TUI session — ask a recall-shaped prompt and inspect the tool calls.
    # Default behaviour should be a fast call (no LLM); follow-ups should
    # drill via guided.
  3. Verify the shipped skill loads:

    • Start a new session; the "Available Skills" banner should include memory: session-recall.
    • Ask a recall-shaped prompt — the agent should skill_view('session-recall') and follow the fast → guided composition described in the worked examples.
  4. Opt into summary as default (legacy behaviour):

    # Add to ~/.hermes/config.yaml:
    # auxiliary:
    #   session_search:
    #     default_mode: summary
    hermes gateway run --replace &
    # New session; default recall now runs through summary.

Tested on Linux.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run the targeted test suite and all tests pass (123 in the focused surface)
  • I've added tests for my changes
  • I've tested on my platform: Linux

Documentation & Housekeeping

  • I've updated relevant documentation (skill SKILL.md + new category DESCRIPTION.md, schema description, docstrings)
  • I've updated cli-config.yaml.example for the new auxiliary.session_search.default_mode knob
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — N/A (no file-I/O, process, or terminal changes; SQLite + pure-Python only)
  • I've updated tool descriptions/schemas (rewrote SESSION_SEARCH_SCHEMA description as a tight spec, updated mode enum description)

For New Skills

  • This skill is broadly useful to most users — session-recall teaches every user how to use session_search effectively; recall is core to long-running agent work
  • SKILL.md follows the standard format (frontmatter with name, description, metadata.hermes.category: memory; pre-flight, mode picker, levers, composition patterns, worked examples, pitfalls, closing note)
  • No external dependencies — pure prose + tool-call examples against session_search (already a Hermes core tool)
  • I've tested the skill end-to-end: in a fresh TUI session, asked a multi-facet recall question, agent loaded the skill, executed the fast → multi-anchor guided composition per Example B, and self-narrated the design choices

JabberELF and others added 29 commits May 11, 2026 17:52
Add mode parameter to session_search tool supporting two modes:
- fast (default): returns FTS5 snippets + context immediately (~0.02s),
  no LLM call — ideal for quick recall lookups
- summary: preserves original behavior with LLM-generated session
  summaries (~10-30s) — use when fast mode is insufficient

Changes:
- tools/session_search_tool.py: implement fast mode path that returns
  FTS hits with snippets/context without calling auxiliary model;
  add mode parameter to schema (enum: fast|summary); apply parent
  session source/metadata resolution in fast mode (same pattern
  as upstream fix 6b4ccb9 in summary mode)
- run_agent.py: pass mode argument from function_args in two call sites
  (direct tool call + subagent path)
- tests/tools/test_session_search.py: add test coverage for fast mode
  output format, summary mode preservation, backwards compatibility,
  and run_agent.py mode forwarding verification

The tool schema description is updated to recommend fast-first usage.
abcdjmm970703@gmail.com → JabberELF for the session_search fast/summary dual-mode salvage.
…pt-in

Reverses the default introduced by the salvaged dual-mode commit.

Why: profiled four representative queries against a real 280-session
state.db (workspace harness, not committed). Summary mode is 1,299x-6,293x
slower than fast (median ~30s vs ~10ms; 99%+ in the auxiliary LLM call) and
produces 2.9x-3.9x larger result blobs, but it answers a materially different
question. The user's typical 'what did we work on for X?' is the summary
question — fast surfaces only what FTS5 directly matched while summary
surfaces cross-session synthesis (e.g. work sessions referenced inside
the matched cron jobs). Backwards-compatible default; fast remains
opt-in for cheap discovery via mode='fast'.

Changes:
- tools/session_search_tool.py: default parameter, defensive coercion
  fallbacks, and registry handler all default to 'summary'. Schema
  description rewritten with measured trade-offs and the 'use fast for
  discovery, summary for recall' framing.
- run_agent.py: both direct call sites mirror the new default.
- tests/tools/test_session_search.py: split the old default-test into
  test_default_search_returns_summary_mode_recap (asserts new default)
  and test_explicit_fast_mode_returns_snippets... (covers fast path
  without mocking the default away). Invalid-mode test now asserts
  fallback to summary. Source-grep test updated.
Adds SessionDB.get_messages_around(session_id, around_message_id, window)
which returns up to 'window' messages before the anchor, the anchor
itself, and up to 'window' after — all from the same session, ordered by
id ascending.

Used by the upcoming session_search mode='guided' (anchored drill-down)
to surface a focused conversation window without summarisation cost or
the 100k-char truncation gamble of mode='summary'.

Boundaries are honoured (fewer messages at session start/end), the
anchor is verified to exist in the named session before fetching (cheap
guard against cross-session id confusion), and content/tool_calls
decoding mirrors get_messages() so callers can swap between the two
without surprises.

Tested: 9 new cases in tests/hermes_state/test_get_messages_around.py
(middle-of-session, first-message, last-message, anchor-not-in-session,
no cross-session leakage, window > session, window=0, window negative,
content decoding parity with get_messages). 62/62 passing including the
existing 53 session_search tests.
Adds 'match_message_id' to each fast-mode result entry, carrying through
the FTS5 message id (already populated in the underlying search_messages
result; just unsurfaced until now).

This is the composition handle for the upcoming mode='guided' drill-down:
the calling agent reads a fast hit, picks a promising session, and passes
session_id + match_message_id back as around_message_id for an anchored
window.

Lossless for non-guided callers (additive field, no schema changes).
One new test (test_fast_mode_includes_match_message_id_for_guided_drilldown).
63/63 passing.
Adds a third mode to session_search: guided returns a window of messages
around a specific message id in a specific session. No FTS5, no
auxiliary LLM, no 100k-char truncation — one DB query (~ms latency).

Designed to compose with mode=fast: the calling agent does cheap FTS5
discovery, picks a promising hit, then calls back with mode='guided',
session_id from the result, and around_message_id=match_message_id from
the same result. The agent gets the actual conversation around the
anchor — the back-and-forth that fast's snippet teases but doesn't
deliver, and that summary distils into prose at 30s+ wall-clock cost.

Mechanics:
- New _guided_drill_down() helper handles the guided dispatch path
- Mode aliases ('drill', 'drilldown', 'drill-down', 'anchor', 'around')
  normalise to 'guided'
- Validates required args (session_id + around_message_id), session
  existence, and anchor-in-session, returning specific tool_error
  messages for each failure mode
- Window clamped silently to [1, 20] (matches existing limit-clamp pattern)
- Rejects drill-down into the calling session's lineage — those messages
  are already in the agent's active context (same convention as fast/
  summary's _resolve_to_parent skip)
- Anchor row carries 'anchor': true so the agent can locate it in the
  ordered window without re-checking ids
- Returns messages_before/messages_after counts so the agent sees boundary
  effects ('this is the first 3, no more available before') without a
  follow-up call

Schema:
- mode enum extended to ['fast', 'summary', 'guided']
- Three new optional parameters: session_id, around_message_id, window
- Description rewritten to teach the discover→drill flow with example
  question shapes per mode

Dispatch:
- run_agent.py's two session_search dispatch sites updated to forward
  the new optional kwargs
- Brittle source-grep test in test_session_search.py updated for the
  new dispatch shape and now also pins the guided-mode kwargs

Tests:
- 11 new cases in TestGuidedMode covering happy path, missing-arg errors,
  window clamps (low + high), session-not-found, anchor-not-in-session,
  session-boundary partial windows, current-lineage rejection, mode
  aliases, schema advertising, and metadata propagation
- 74/74 passing including the existing 53 + 9 hermes_state unit tests

End-to-end verified against a real DB snapshot: fast → read
match_message_id+session_id off the top hit → guided returns 7 messages
(3 before + anchor + 3 after) at ~40 KB payload, vs summary's ~220 KB
auxiliary-LLM input for the same query.
The original ceiling of 5 was sized for summary mode where each result
costs a parallel auxiliary LLM call (~30s wall total). With the steering
reframing of guided mode (see investigation page §6), fast becomes the
'discover and let the user pick' surface, and the user benefits from
seeing more candidates before committing to a drill-down.

Bumping the ceiling to 10 lets callers ask for a wider hit list when
that's the goal. Default stays at 3 (one-shot recall is unchanged).

Schema description updated to teach the LLM when to bump higher: 'when
the user wants to be in the retrieval loop and pick the right anchor for
a guided drill-down'.

For summary mode this means up to 10 parallel aux calls instead of 5;
the existing concurrency semaphore already bounds the actual wall time,
and most users won't hit the higher cap unless they're using fast.

65/65 passing.
Extends mode='guided' to accept a list of anchors instead of a single
session+message pair. The agent calls fast with a wider limit, picks
the most promising K hits from the result list, and drills into all of
them in a single guided call — one window per anchor in the response.

This is the steering improvement flagged in the investigation page §6:
'5 results, pick top 3, strip tools' (strip-tools is a separate later
follow-up). Letting the agent inspect multiple windows in one turn
reduces the back-and-forth between fast and guided when the user
genuinely wants to look at several candidate sessions before committing.

Two input shapes (use one):
  * Single anchor (back-compat): session_id + around_message_id
  * Multi-anchor: anchors=[{session_id, around_message_id}, ...]

Single-anchor calls (the back-compat path) continue to work unchanged
and the response mirrors legacy fields at the top level when there's
exactly one window. Multi-anchor responses carry only 'windows' as the
authoritative list. Per-anchor failures (missing session, anchor not
in session, current-lineage rejection) become inline error entries
inside 'windows' rather than aborting the whole call — the agent can
still use successful drills if one anchor was malformed.

Window is shared across all anchors and clamped once to [1, 20].

Schema description updated to teach when to bump fast's limit higher
(5–10 for steering use cases) and how to compose anchors=[...] from
those results.

Tests:
- 7 new cases in TestGuidedModeMultiAnchor covering: two anchors both
  succeed, one-fails-one-succeeds doesn't abort, single anchor via
  anchors list normalises to legacy shape, empty/non-list anchors
  return tool_error, window clamp shared across anchors, per-anchor
  current-lineage rejection
- Brittle source-grep test updated to also pin the new anchors=
  forwarding in run_agent.py
- 81/81 passing including the existing 65 + 7 new + brittle update + 9
  hermes_state unit tests

End-to-end verified against real DB snapshot: 5 fast hits → top 3 as
anchors → 3 windows of 7 messages each (~100 kB total).
Live-test surfaced a real bug: fast-mode results paired the resolved
lineage-root session_id with the raw FTS5 row's message_id. The (sid,
match_message_id) handle was self-inconsistent because the message
lives in the child (delegation/compression) session, not the parent —
so the agent's follow-up mode='guided' call hit
'around_message_id N not in session_id ROOT' and the drill failed.

Repro: ask the TUI to fast-search a topic that appears in a compressed
child session of the current lineage, then ask it to drill in. Today's
session is exactly that shape — message 18425 lives in
20260512_102257_d5048c (child) but fast returned its parent
20260511_101921_a7dd34 paired with id=18425.

Fix has two layers:

1) Fast-mode output now pairs session_id (raw FTS5 sid) with
   match_message_id consistently. The lineage root is exposed as a
   separate parent_session_id field (omitted when there's no
   delegation/compression above). Dedup grouping still happens by
   lineage root, so the user still sees one entry per conversation,
   but the per-entry handle is now a valid pair the agent can hand
   straight to mode='guided'.
   - #15909 source-from-parent invariant preserved: source/model/title
     still promote from the resolved parent for display.

2) Defensive rebind in mode='guided': if (a_sid, a_msg_id) doesn't
   resolve, look up the actual owning session for a_msg_id. If it's a
   descendant in the same lineage as a_sid, transparently rebind and
   refetch. Records the rebind in a warning field on the returned
   window (also flattened to top level for single-anchor responses).
   Cross-lineage rebinds are refused — that path stays an error.
   This keeps the tool forgiving for legacy callers, memory snippets,
   or any other source that still emits the old (parent_sid, child_id)
   shape.

3) Schema description tightened: explicit note that the agent must
   pass (session_id, match_message_id) verbatim from a single fast
   result — do NOT substitute parent_session_id (it's display-only).

Tests: updated the existing #15909 regression to assert the new pair
shape, plus four new tests:
  - test_fast_pair_session_id_with_match_message_id (positive)
  - test_fast_no_parent_session_id_field_when_session_is_already_root
    (tidy output for non-delegation case)
  - test_guided_rebinds_anchor_when_message_lives_in_descendant_session
    (safety net fires correctly within a lineage)
  - test_guided_does_not_rebind_across_lineages (refuses cross-lineage
    rebind — no silent drill into unrelated session)

85/85 session_search + get_messages_around tests passing. Live-DB
smoke test against /tmp/state-smoke.db (snapshot of ~/.hermes/state.db)
confirms the user's failing case now rebinds:
  success: True
  top-level warning: 'around_message_id 18425 lives in
    20260512_102257_d5048c (child of 20260511_101921_a7dd34);
    rebound transparently'
  returned session_id: 20260512_102257_d5048c
  window before/after: 5 / 5
The default mode is normally 'summary' (LLM recap of matched sessions).
This commit lets a user override that via:

    # ~/.hermes/config.yaml
    tools:
      session_search:
        default_mode: fast

Useful for power users who want to live with fast-as-default for a few
days and see how it feels — without having to pass mode='fast' on every
call. The summary path is still one explicit kwarg away.

Resolution order at call time:
  1. Explicit mode= argument from the LLM (always wins)
  2. tools.session_search.default_mode in ~/.hermes/config.yaml
  3. 'summary' (final fallback)

Implementation:

  - New helper _resolve_user_default_mode() in tools/session_search_tool.py
    reads the value via hermes_cli.config.load_config(). Wrapped in
    functools.lru_cache so the YAML read happens at most once per process
    (config changes need a CLI / TUI restart, which is the existing
    convention).
  - Validates: must be a string, must be 'fast' or 'summary'. Anything
    else (including 'guided', which needs anchors and can't stand alone)
    logs a warning and falls back to 'summary'. The user gets feedback
    when they typo their config.
  - session_search()'s mode normaliser checks for None/empty/non-string
    first and resolves the user default before applying alias mapping.
    Explicit modes still take precedence over config.
  - Both dispatch sites in run_agent.py changed from
    mode=function_args.get('mode', 'summary') → mode=function_args.get('mode').
    Hardcoding 'summary' at dispatch would shadow the new config-default
    layer. Added a guard assert in test_run_agent_special_session_search_paths_forward_mode
    so a regression to the old shape fails loudly.
  - Schema description gets one extra sentence acknowledging the
    user-configurable default so the LLM's own description of the tool
    reflects reality.

Tests (+8):
  - test_unset_mode_falls_back_to_summary_when_config_missing
  - test_user_can_configure_fast_as_default
  - test_user_can_configure_summary_as_default_explicitly
  - test_invalid_default_mode_warns_and_falls_back  (typo test)
  - test_guided_as_default_mode_is_rejected
  - test_non_string_default_mode_falls_back  (bogus YAML types)
  - test_explicit_mode_argument_overrides_user_default
  - test_unset_mode_with_config_default_fast_runs_fast_path  (e2e)

93/93 session_search + get_messages_around tests passing.

This is thread 2 of the prompt-tuning / default-mode plan from the
spike: thread 1 was the schema-description iteration (still in progress
on the spike page); thread 2 lets users carry the experiment around in
their own config while we converge on whether to flip the global default
in the schema.
…ame as two starts + one follow-up

Live-test conversation surfaced that the 'three modes (fast, summary,
guided)' framing makes the modes sound like peers when they aren't.
Guided literally cannot be a default — _resolve_user_default_mode()
already rejects it and forces summary. The honest shape is two
starting moves (fast, summary) plus one follow-up move (guided) that
needs anchors from a prior call.

Two cleanups follow from that:

1) Schema description rewritten with the 'two starts + one follow-up'
   framing. Old MODES 1/2/3 list replaced with a structured 'Starting
   moves' / 'Follow-up move' block. Recommended flows section folded
   in (the per-question heuristics are now under each move's bullet).

2) Single-anchor schema parameters (session_id, around_message_id)
   REMOVED from the LLM-facing schema. After multi-anchor shipped,
   one-element anchors=[{...}] handles the single-anchor case
   identically. Keeping both shapes in the schema was confusing — the
   LLM occasionally tried to pair them or asked which to use.
   The Python session_search() function still accepts session_id /
   around_message_id kwargs for direct callers and test fixtures
   (back-compat); only the LLM-facing schema lost them. Parameter
   surface dropped from 6 LLM-visible knobs to 4 (query, role_filter,
   limit, mode + anchors, window).

The mode parameter's description also got tightened — short summary of
each mode, points to the top-level description for when-to-use
guidance. The old description was duplicating the top-level mode
explanation in a more verbose form.

Updated test_schema_advertises_guided_mode:
  - Asserts match_message_id pairing guidance now lives on the
    anchors parameter, not the top-level description.
  - Explicitly asserts session_id / around_message_id are NOT in the
    schema (regression-proof against re-adding them).

93/93 session_search + get_messages_around tests passing.

This is the param-surface cleanup discussed yesterday alongside the
default_mode config commit. Closes the schema-surface side of the
'fast vs guided is confusing' user feedback; the spike doc §6.7 / §7
get matching updates in a separate commit on the architecture branch.
…-guided refusal

Two schema description tweaks driven by smoke-test findings (PLAN.md v1.8):

1. S09 (search-fidelity FAIL) — agent skipped session_search entirely
   when asked 'what's the status of the commons-messaging PR on
   yoniebans.github.io?' and went straight to gh pr list. Technically
   correct that no PR existed, but missed two prior sessions and today's
   planning doc that referenced the branch.

   Fix: lead the USE THIS PROACTIVELY list with an explicit instruction
   to call session_search BEFORE external tools (gh, GitHub API, web,
   file inspection) when the question references prior work. The session
   DB carries what was DISCUSSED and DECIDED; external tools only show
   current world state. Use session_search to find context, external
   tools to verify reality.

2. S08 (schema-teaching weak case) — agent was asked to drill cold with
   multi-anchor guided. Did NOT refuse. Improvised recent → fast → fast
   → guided in one turn. Functionally correct (self-fed anchors from its
   own preceding fast calls), but the schema's 'cannot be a starting
   move' framing was followed in spirit, not articulated. The agent
   should EITHER refuse and ask, OR explicitly call fast first as a
   prerequisite — not silently improvise.

   Fix: reword 'Cannot be a starting move on its own' to a directive
   'REQUIRES anchors from a prior fast or summary call. If you have no
   prior fast hit, call fast FIRST and use its match_message_id values
   as anchors. Never invent anchors or guess session_ids.' Same change
   echoed in the per-parameter mode description for the second-read
   reinforcement.

Other 12 scenarios were clean. Schema base is good; these are surgical
fixes for the two cases where the framing didn't land hard enough.

93/93 session_search + get_messages_around tests still pass.
…ribution

Summary mode invokes an auxiliary LLM (same Opus-tier model in default
'auto' routing) once per session summarised, with up to ~28K input
tokens (MAX_SESSION_CHARS=100K chars) and up to 10K output tokens
(MAX_SUMMARY_TOKENS) per call. That cost was being silently discarded:
_summarize_session() consumed response.usage only for the content string
and threw the usage data away. Smoke-test cost reporting showed
summary-mode scenarios at a fraction of their real spend because of it.

This patch:
- Changes _summarize_session() to return (content, usage) where usage
  is a normalised dict {model, input_tokens, output_tokens,
  cache_read_tokens, cache_creation_tokens} or None when the provider
  didn't surface usage.
- Adds _extract_aux_usage() that handles both OpenAI-style
  (prompt_tokens/completion_tokens, prompt_tokens_details.cached_tokens)
  and Anthropic-style (input_tokens/output_tokens,
  cache_read_input_tokens, cache_creation_input_tokens) usage shapes.
- The summary-mode caller aggregates per-session usage into both an
  entry-level 'aux_usage' field and a top-level 'aux_usage_total'
  carrying a call_count. The aggregate is omitted from the payload
  entirely when no usage data was captured (test mocks, providers that
  don't report it) so consumers can distinguish 'no data' from
  'all zero'.

Note: this surfaces aux cost in the tool RESPONSE, where downstream
metrics extraction can pick it up. It does NOT yet attribute the cost
back to the parent session row (sessions.input_tokens / output_tokens /
estimated_cost_usd) — that's a wider fix to async_call_llm and the
session DB, out of scope here. Aggregator scripts (smoke-test
extractor, dashboards) get the data they need from the tool payload
without that wider change.
The registry handler hardcoded mode=args.get("mode", "summary") and the
function signature defaulted to "summary", which together made the
tools.session_search.default_mode config knob structurally unreachable
from real tool calls — _resolve_user_default_mode() only fires when
mode is None/empty, but neither path ever delivered None.

Drop both "summary" fallbacks so an omitted mode flows through as None
and the config-resolution branch can run.

Adds two tests: a static guard on the registry handler source pattern
(mirroring the existing run_agent.py one) and an end-to-end regression
that dispatches through the registry with default_mode='fast' configured
and asserts result["mode"] == "fast".
The previous fix wired _resolve_user_default_mode() to look up
tools.session_search.default_mode, but the config schema has no
top-level 'tools' section. The closest analogue is auxiliary.<tool>,
which already groups per-tool config by tool name (auxiliary.vision
has download_timeout, auxiliary.session_search has max_concurrency —
neither is strictly aux-LLM routing).

This moves the lookup to auxiliary.session_search.default_mode so the
knob lives next to max_concurrency and the existing session_search
config block. Adds default_mode to the default config scaffold so it
shows up in fresh installs.

Updates docstring, tool description string, warning messages, and all
7 mock-config tests to the new path. 88/88 tests passing.
…t→guided

The prior tool description routed 'catch me up on X' / 'what did we decide'
questions to summary mode by default, which was the failure mode the
fast/guided rework was meant to fix. Summary stays available and is honoured
when users configure it explicitly; the description now teaches fast→guided
as the default recall path and calls out summary as opt-in synthesis.

Schema mode.default flipped summary → fast. Resolver/scaffold fallback
unchanged (still 'summary') for backward compatibility.

No logic changes, no test updates needed; 88/88 passing.
…l noise

Three coordinated changes to make guided mode actually answer 'catch me up
on X' questions without needing summary:

1. New SessionDB.get_anchored_view() helper: returns the anchored window
   plus the first/last N user+assistant messages of the session as
   'bookend_start' / 'bookend_end'. Bookends are skipped when the window
   already overlaps the session head or tail, so the response stays tight.
   Default bookend=3, keep_roles=('user','assistant'). Tool messages are
   dropped from the window EXCEPT the anchor itself (which may legitimately
   be a tool message — dropping it would break the contract).

2. session_search mode='guided' switched to get_anchored_view (both primary
   path and the child-session rebind fallback). Response shape gains
   bookend_start + bookend_end alongside the existing messages array;
   single-anchor response mirrors them at the top level for back-compat.

3. session_search mode='fast' now defaults role_filter to 'user,assistant'
   when the caller doesn't pass one. Tool messages are mostly noise for
   FTS5 (large outputs, serialised tool calls). Callers can opt back in
   via role_filter='user,assistant,tool' for debugging or 'tool' for tool
   output only.

Schema description updated to document bookends + tool filtering, and the
role_filter param description spells out the new default.

Test coverage:
- tests/hermes_state/test_get_anchored_view.py (12 tests): window/bookend
  contract, role filtering, anchor-as-tool preservation, session isolation
- tests/tools/test_session_search.py: existing _make_db fixtures bridged
  get_anchored_view → get_messages_around so the old guided tests still
  pass; new TestGuidedBookendsInResponse asserts response shape; new
  TestFastModeRoleFilterDefault pins the role_filter default.

122/122 passing across tests/hermes_state/ + tests/tools/test_session_search.py.
Single-commit revert-friendly.
Bookends were eating slots with tool-call-only assistant turns (content=''
with tool_calls populated). On long sessions whose tail is dominated by
orchestration heartbeats — poll, terminal, pgrep, etc. — bookend_end was
returning 3 empty rows instead of the actual prose closer.

Fix: add 'length(content) > 0' to both bookend SQL queries. Tool-call-only
assistants are skipped at the DB level; the closing prose ('Gateway
replaced...', 'Committed and pushed', etc.) survives into bookend_end.

User messages are never affected — the column is always populated for
user-role rows (verified against the live DB: 22 NULL-content rows total,
zero of them user-role).

Test: tests/hermes_state/test_get_anchored_view.py adds
test_bookends_skip_empty_content_assistant_turns — seeds a session with
the heartbeat pattern that exposed the bug and asserts the actual
opener/closer survive into bookend_start/bookend_end.

106/106 passing.
…ineage awareness

Three additions to the tool description so the LLM uses the machinery
that already exists:

1. MULTI-SESSION CATCH-UP: explicit instruction that when a topic spans
   multiple sessions, drill the top 2-3 fast hits as a single multi-anchor
   guided call — not just the top one. The multi-anchor shape was already
   supported but agents were anchoring on the top hit only and missing
   work in adjacent sessions.

2. READING GUIDED RESPONSES: explicit callout that every guided window
   carries three slices (bookend_start, messages, bookend_end) and the
   resolution lives in bookend_end. Reduces the risk of the LLM glossing
   the new bookend fields.

3. LINEAGE AWARENESS: notes that a child session's first messages are a
   post-compaction handoff, not the original arc opener — spot via
   parent_session_id. Tells the LLM how to recover the real opener when
   it matters (rare, but free to teach).

anchors param description updated to reinforce multi-anchor catch-up at
the point-of-use.

No behavioural change — schema description only. 106/106 tests passing.
When fast returns hits whose snippets all look like the same keywords
echoing (because the searched topic IS the subject of those sessions —
e.g. searching 'session_search' in sessions about session_search),
the snippets are decorative, not signal. The temptation is to pivot to
find/grep/raw SQL — same shape failure as reflexive summary, just with
manual archaeology instead of LLM telephone.

New schema section instructs: don't pivot, drill. bookend_end carries
the session's prose resolution that the snippets routinely miss.

Observed failure that motivated this: an assistant asked to find a
recently-drafted PR body got fast results with the right session in the
top 5, but the snippets were wall-to-wall '>>>session_search<<<' markers,
so it pivoted to find/sqlite3 and burned ~10 minutes. The right session's
bookend_end contained 'Draft written to <path>' — exactly the artefact
being searched for.

No behavioural change; schema-only. 106/106 passing.
Fast mode currently orders results by FTS5 BM25 rank only. That's correct
when the user's question is exploratory ('what do we know about X') —
relevance leads, time is neutral — but it actively hurts two other common
question shapes:

1. Recency-shaped: 'where did we leave X', 'latest status of Y'. Same-rank
   matches from years ago and yesterday are tied; FTS5 picks arbitrarily.
   A reactivated old session can outrank a fresh one with no signal.
2. Origin-shaped: 'how did X start', 'first time we discussed Y'. The
   originating session is usually short and gets out-scored by later
   sessions that revisit the topic with more context — the origin hides
   under its own descendants.

Adding a temporal tie-breaker by default would silently bias every query
toward 'latest', breaking the origin-shaped case. So sort is opt-in and
bidirectional, matching the existing 'agent picks the mode that fits the
question shape' pattern.

What this adds:
- session_search() gains a sort parameter accepting 'newest', 'oldest',
  or None (default = current FTS5 rank-only behaviour preserved).
- db.search_messages() honours sort across all three SQL paths: main
  FTS5 (timestamp DESC/ASC primary, rank tiebreaker), trigram CJK
  (same), LIKE fallback (timestamp direction flip; no rank to combine).
- Tool layer normalises sort case-insensitively, falls back to None on
  garbage values rather than failing the search, and silently strips
  sort outside fast mode (with a debug log). Summary's session
  selection deliberately stays time-neutral — agents wanting temporal
  narrative drive fast with sort, then drill anchors with guided.
- Schema description gains a TEMPORAL DIRECTION section with concrete
  question-shape examples, and a sort property on the parameters
  block enumerating the valid values.

Tests:
- 6 new tool-layer tests covering default behaviour, both directions,
  case-insensitivity, garbage fallback, and silent-ignore in summary.
- 4 new SQL-layer tests against the real DB exercising 'newest' /
  'oldest' / unset (BM25 rank preserved) / invalid (rank fallback).
- 95→102 passing on tools/test_session_search.py before this commit;
  108 passing after.
…in schema

Smoke-test v2 surfaced that S13 (auxiliary.session_search.default_mode: summary)
went fast→guided 5/5 iterations instead of respecting the user's configured
summary default. The agent passed mode='fast' explicitly on every first call,
ignoring the config.

Root cause: the 'respect the configured default' guidance lived at the very
bottom of the schema description, after all the 'fast → guided is best' teaching.
The general guidance was louder than the user-preference clause.

Fix: hoist USER-CONFIGURED DEFAULT to the top of the description, framed as
something the agent should check FIRST. Strengthen the language: honour the
user's configured default on the first call unless the question shape
categorically requires a different mode. Don't override the user just because
the general guidance says fast→guided is best.

Replace the redundant bottom paragraph with a brief pointer to the top.

No code changes — schema description only. Tests still 99/99.
…rst call

Previous patch (71558e7) hoisted USER-CONFIGURED DEFAULT to the top of the
schema with 'honour unless question shape categorically requires'. Re-running
S13 with default_mode: summary still went fast→guided 5/5 — the agent
rationalised that synthesis questions categorically require fast→guided.

The schema teaching needs the escape clause removed. The user paying for the
call has the better context on which trade they want; the agent shouldn't
override based on its read of the question shape. After the first call, the
agent can chain freely (e.g. guided drill into fast results), but the first
mode comes from the configured default.

Still no resolver-level hard lock. If schema teaching at this strength still
fails to make the agent respect the user's preference, that's a separate
follow-up — but at minimum the user's preference is now loud in the prompt.

99/99 tests still passing.
Teach the agent to use session_search effectively. Covers the three
modes (fast/guided/summary), levers for tuning each call, composition
patterns including multi-anchor catch-up, worked examples for named-
artefact lookup and multi-session arc recall, and pitfalls.
The tool-description prose had accumulated playbook-style guidance over
the course of development (pre-flight rules, mode-picking policy,
multi-anchor recipe, anti-pattern teaching, reading-order advice).
That material now lives in the session-recall skill where it can be
loaded on demand rather than shipping in every system prompt.

Schema description now covers only what the tool IS: what each mode
returns, default-mode resolution, anchor contract, FTS5 syntax, and a
one-paragraph 'when to use'. Mode enum description shrunk to three
one-line entries. Cost claims generalised — no fixed dollar figures
since aux-LLM cost depends on the user's configured aux model.

Net: ~9.5 KB -> ~3 KB of description prose. One schema-content
assertion in tests updated to match the new phrasing while keeping the
same intent (cross-session language exists; no current-session nudge).
…a + PR body)

The schema description and the JSON-schema `mode.default` advertise `fast`
as the default mode. The implementation was advertising one default and
running another: DEFAULT_CONFIG shipped `default_mode: summary`, the
resolver's six fallback paths all returned `summary`, and the
invalid-mode coercion at the dispatch site hard-coded `summary` too.

Net effect was the model being told 'default is fast' while the server
ran summary — exactly the cost behaviour this work is meant to avoid.

Changes:
- hermes_cli/config.py: DEFAULT_CONFIG default_mode `summary` → `fast`.
- tools/session_search_tool.py: every `return "summary"` fallback in
  _resolve_user_default_mode() now returns "fast" (six paths: ImportError,
  general Exception, raw is None, non-string raw, invalid value, and the
  function-level fallback). Warning log strings updated to match.
- tools/session_search_tool.py: invalid-`mode=` arg at the dispatch site
  now falls back to _resolve_user_default_mode() instead of hard-coding
  "summary". Silent coercion of typos now still respects the user's
  configured default.
- tests: 11 tests updated to match the new default (six in the resolver
  fallback class, three test methods renamed, plus the parametrised
  invalid-mode test and the positional-db backward-compat test). The
  new test names reflect what's being verified rather than the old
  default value.
…invalid-mode path

Three small follow-ups from the default-mode fix review:

1. Extract the literal 'fast' fallback into a module-level
   _FALLBACK_DEFAULT_MODE constant. Six call sites in
   _resolve_user_default_mode() now reference the constant, removing
   the drift risk of changing the default in some paths but not
   others.

2. New integration test: bogus mode= string at the dispatch site
   with no config falls back to the resolver-resolved default ('fast').
   Proves the dispatch site calls the resolver rather than hardcoding
   a literal.

3. New integration test: bogus mode= string with default_mode=summary
   in config lands on summary. Proves the dispatch-site coercion
   honours the user's configured default for unknown modes too — not
   just for unset modes.
The DEFAULT_CONFIG entry was added in this PR but the example config
file wasn't kept in sync. Per CONTRIBUTING.md, config changes need to
mirror into cli-config.yaml.example so users can see the knob and its
documented values.
@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: feat/session_search_modes vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8324 on HEAD, 8278 on base (🆕 +46)

🆕 New issues (29):

Rule Count
invalid-argument-type 17
invalid-return-type 2
unresolved-attribute 2
unresolved-import 2
invalid-parameter-default 2
unsupported-operator 2
invalid-assignment 2
First entries
tests/tools/test_session_search.py:1941: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `str`, found `None`
tests/tools/test_session_search.py:754: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `str`, found `Literal["garbage", "", "RANDOM", 42] | None`
tests/tools/test_session_search.py:1747: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["anchors"]` on object of type `list[Unknown]`
tools/session_search_tool.py:1048: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `list[tuple[Unknown, ...] | Exception]`, found `list[Unknown | BaseException]`
tests/tools/test_session_search.py:1739: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["mode"]` on object of type `list[Unknown]`
run_agent.py:10694: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `int`, found `Unknown | None`
tools/session_search_tool.py:549: [invalid-argument-type] invalid-argument-type: Argument to constructor `int.__new__` is incorrect: Expected `str | Buffer | SupportsInt | SupportsIndex | SupportsTrunc`, found `Any | None | str`
tests/tools/test_session_search.py:911: [unresolved-attribute] unresolved-attribute: Function `_resolve_user_default_mode` has no attribute `cache_clear`
tests/tools/test_session_search.py:1957: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Literal["not_a_list"]`
tests/hermes_state/test_get_messages_around.py:11: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_session_search.py:1746: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["properties"]` on object of type `str`
tests/tools/test_session_search.py:1743: [unresolved-attribute] unresolved-attribute: Attribute `lower` is not defined on `dict[str, str | dict[str, dict[str, str] | dict[str, str | int] | dict[str, str | list[str]] | dict[str, str | dict[str, str | dict[str, dict[str, str]] | list[str]]]] | list[Unknown]]` in union `str | dict[str, str | dict[str, dict[str, str] | dict[str, str | int] | dict[str, str | list[str]] | dict[str, str | dict[str, str | dict[str, dict[str, str]] | list[str]]]] | list[Unknown]]`
run_agent.py:11333: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Any | None`
tools/session_search_tool.py:784: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `list[Unknown]`
tests/tools/test_session_search.py:1747: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["anchors"]` on object of type `str`
tools/session_search_tool.py:894: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.search_messages` is incorrect: Expected `str`, found `None | str`
tests/tools/test_session_search.py:1740: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["guided"]` and `Unknown | str | int | list[str] | dict[str, str | dict[str, dict[str, str]] | list[str]]`
tests/hermes_state/test_get_anchored_view.py:14: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_session_search.py:1739: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["mode"]` on object of type `str`
tools/session_search_tool.py:253: [invalid-return-type] invalid-return-type: Function can implicitly return `None`, which is not assignable to return type `tuple[str | None, dict[str, Any] | None]`
run_agent.py:13774: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
tests/tools/test_session_search.py:1747: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["match_message_id"]` and `Unknown | str | int | list[str] | dict[str, str | dict[str, dict[str, str]] | list[str]]`
tools/session_search_tool.py:1137: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["aux_usage_total"]` and value of type `dict[str, None | int]` on object of type `dict[str, int | str | list[Unknown]]`
run_agent.py:11331: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `int`, found `Any | None`
run_agent.py:10696: [invalid-argument-type] invalid-argument-type: Argument to function `session_search` is incorrect: Expected `list[Unknown]`, found `Unknown | None`
... and 4 more

✅ Fixed issues (5):

Rule Count
invalid-argument-type 4
invalid-return-type 1
First entries
run_agent.py:7482: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:13767: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:13764: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
tools/session_search_tool.py:375: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.search_messages` is incorrect: Expected `list[str]`, found `None | list[str]`
tools/session_search_tool.py:474: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `list[str | Exception]`, found `list[str | None | BaseException]`

Unchanged: 4319 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/memory Memory tool and memory providers labels May 15, 2026
Pass over comments added during the iterative development of this PR,
trimming where they restated the code, repeated themselves, or read
as journal-style narration. Net -22 comment lines; behaviour
unchanged, 123 tests still passing.

Notable trims:
- DEFAULT_CONFIG module header: 9 lines → 4. Dropped the 'auxiliary
  started as aux-LLM routing but in practice groups per-tool config'
  digression — irrelevant to readers of this module.
- get_anchored_view bookend-SQL filter block: 8 lines → 5. The
  'let me check…-shaped assistant messages' over-narration is gone;
  the SQL filter rationale survives.
- Fast-mode lineage-grouping IMPORTANT block: 12 lines → 8. The
  '#regression introduced by the original match_message_id rollout'
  meta-note removed (the comment now states the contract directly).
- Fast-mode result-emission comment: 8 lines → 3. The 'lineage_root
  is the dict key…' explanation was restating the variables; the
  load-bearing one-liner (emit raw_sid + match_message_id) stays.
- sort normalisation comment: 4 lines → 3.
- role_filter parse comment: 5 lines → 3.
- ORDER BY comment in search_messages: 3 lines → 2.
- LIKE fallback ordering comment: 4 lines → 2.
teknium1 added a commit that referenced this pull request May 18, 2026
…e — no LLM (#27590)

* feat(session_search): single-shape tool with discovery, scroll, browse — no LLM

Replaces the LLM-summarized session_search with a single-shape tool that
returns actual messages from the DB. Three calling shapes inferred from
args (no mode parameter):

  1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit,
     all in one call. ~20ms on a real DB instead of ~90s for the previous
     three aux-LLM calls.
  2. Scroll — pass session_id + around_message_id. Returns a window
     centered on the anchor. To paginate, re-anchor on the first/last id
     of the returned window. Boundary message appears in both windows
     as the orientation marker. ~1ms per scroll call.
  3. Browse — no args. Recent sessions chronologically.

Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give
the agent goal + resolution on every discovery hit, so a single tool call
reconstructs a long session's arc without loading the whole transcript.

The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and
laundered FTS5 hits through a model that could confabulate when the right
session wasn't in the hit list. The merged shape returns byte-for-byte
content from SQLite.

History:
- PR #20238 (JabberELF) seeded the fast/summary dual-mode split.
- PR #26419 (yoniebans) expanded to fast/guided/summary with bookends,
  multi-anchor drill-down, default-mode config, and a teaching skill.

This PR collapses that toolkit into one shape with explicit scroll
support, drops the summary path, drops the mode parameter, drops the
config knob, drops the skill. JabberELF's seed work is acknowledged via
the AUTHOR_MAP entry.

Validation:
- 38/38 tool tests pass (tests/tools/test_session_search.py)
- 12/12 get_messages_around tests pass (tests/hermes_state/)
- 11/11 get_anchored_view tests pass (tests/hermes_state/)
- Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main
  (test ordering in test_delegate.py, unrelated)
- E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms;
  pagination forward+backward works with boundary-message orientation;
  error paths return clean tool_error responses

Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>

* chore(session_search): prune dead LLM-summary config and docs

Companion to the single-shape rewrite. The auxiliary.session_search config
block, max_concurrency / extra_body tunables, and matching docs sections
all referenced the removed LLM summarization path. Removing them so users
don't try to tune knobs that nothing reads.

- hermes_cli/config.py: drop dead auxiliary.session_search block from
  DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and
  ignored.
- hermes_cli/tips.py: drop two tips referencing the removed
  max_concurrency / extra_body knobs.
- website/docs/user-guide/configuration.md: drop 'Session Search Tuning'
  section and the auxiliary.session_search block from the example.
- website/docs/user-guide/features/fallback-providers.md: drop session_search
  rows from the auxiliary-tasks tables and the dedicated tuning subsection.
- website/docs/reference/tools-reference.md: rewrite the session_search
  entry to describe the new three-shape behaviour.
- CONTRIBUTING.md: update the file-tree description.
- tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone
  class and test_session_search_tool_guarded — both guard against an
  unguarded .content.strip() call site in _summarize_session() that no
  longer exists.

Validation: 97/97 targeted tests still pass (hermes_state + session_search +
llm_content_none_guard). Config tests 55/55.

---------

Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>
Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
…e — no LLM (NousResearch#27590)

* feat(session_search): single-shape tool with discovery, scroll, browse — no LLM

Replaces the LLM-summarized session_search with a single-shape tool that
returns actual messages from the DB. Three calling shapes inferred from
args (no mode parameter):

  1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit,
     all in one call. ~20ms on a real DB instead of ~90s for the previous
     three aux-LLM calls.
  2. Scroll — pass session_id + around_message_id. Returns a window
     centered on the anchor. To paginate, re-anchor on the first/last id
     of the returned window. Boundary message appears in both windows
     as the orientation marker. ~1ms per scroll call.
  3. Browse — no args. Recent sessions chronologically.

Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give
the agent goal + resolution on every discovery hit, so a single tool call
reconstructs a long session's arc without loading the whole transcript.

The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and
laundered FTS5 hits through a model that could confabulate when the right
session wasn't in the hit list. The merged shape returns byte-for-byte
content from SQLite.

History:
- PR NousResearch#20238 (JabberELF) seeded the fast/summary dual-mode split.
- PR NousResearch#26419 (yoniebans) expanded to fast/guided/summary with bookends,
  multi-anchor drill-down, default-mode config, and a teaching skill.

This PR collapses that toolkit into one shape with explicit scroll
support, drops the summary path, drops the mode parameter, drops the
config knob, drops the skill. JabberELF's seed work is acknowledged via
the AUTHOR_MAP entry.

Validation:
- 38/38 tool tests pass (tests/tools/test_session_search.py)
- 12/12 get_messages_around tests pass (tests/hermes_state/)
- 11/11 get_anchored_view tests pass (tests/hermes_state/)
- Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main
  (test ordering in test_delegate.py, unrelated)
- E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms;
  pagination forward+backward works with boundary-message orientation;
  error paths return clean tool_error responses

Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>

* chore(session_search): prune dead LLM-summary config and docs

Companion to the single-shape rewrite. The auxiliary.session_search config
block, max_concurrency / extra_body tunables, and matching docs sections
all referenced the removed LLM summarization path. Removing them so users
don't try to tune knobs that nothing reads.

- hermes_cli/config.py: drop dead auxiliary.session_search block from
  DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and
  ignored.
- hermes_cli/tips.py: drop two tips referencing the removed
  max_concurrency / extra_body knobs.
- website/docs/user-guide/configuration.md: drop 'Session Search Tuning'
  section and the auxiliary.session_search block from the example.
- website/docs/user-guide/features/fallback-providers.md: drop session_search
  rows from the auxiliary-tasks tables and the dedicated tuning subsection.
- website/docs/reference/tools-reference.md: rewrite the session_search
  entry to describe the new three-shape behaviour.
- CONTRIBUTING.md: update the file-tree description.
- tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone
  class and test_session_search_tool_guarded — both guard against an
  unguarded .content.strip() call site in _summarize_session() that no
  longer exists.

Validation: 97/97 targeted tests still pass (hermes_state + session_search +
llm_content_none_guard). Config tests 55/55.

---------

Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…e — no LLM (NousResearch#27590)

* feat(session_search): single-shape tool with discovery, scroll, browse — no LLM

Replaces the LLM-summarized session_search with a single-shape tool that
returns actual messages from the DB. Three calling shapes inferred from
args (no mode parameter):

  1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit,
     all in one call. ~20ms on a real DB instead of ~90s for the previous
     three aux-LLM calls.
  2. Scroll — pass session_id + around_message_id. Returns a window
     centered on the anchor. To paginate, re-anchor on the first/last id
     of the returned window. Boundary message appears in both windows
     as the orientation marker. ~1ms per scroll call.
  3. Browse — no args. Recent sessions chronologically.

Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give
the agent goal + resolution on every discovery hit, so a single tool call
reconstructs a long session's arc without loading the whole transcript.

The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and
laundered FTS5 hits through a model that could confabulate when the right
session wasn't in the hit list. The merged shape returns byte-for-byte
content from SQLite.

History:
- PR NousResearch#20238 (JabberELF) seeded the fast/summary dual-mode split.
- PR NousResearch#26419 (yoniebans) expanded to fast/guided/summary with bookends,
  multi-anchor drill-down, default-mode config, and a teaching skill.

This PR collapses that toolkit into one shape with explicit scroll
support, drops the summary path, drops the mode parameter, drops the
config knob, drops the skill. JabberELF's seed work is acknowledged via
the AUTHOR_MAP entry.

Validation:
- 38/38 tool tests pass (tests/tools/test_session_search.py)
- 12/12 get_messages_around tests pass (tests/hermes_state/)
- 11/11 get_anchored_view tests pass (tests/hermes_state/)
- Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main
  (test ordering in test_delegate.py, unrelated)
- E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms;
  pagination forward+backward works with boundary-message orientation;
  error paths return clean tool_error responses

Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>

* chore(session_search): prune dead LLM-summary config and docs

Companion to the single-shape rewrite. The auxiliary.session_search config
block, max_concurrency / extra_body tunables, and matching docs sections
all referenced the removed LLM summarization path. Removing them so users
don't try to tune knobs that nothing reads.

- hermes_cli/config.py: drop dead auxiliary.session_search block from
  DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and
  ignored.
- hermes_cli/tips.py: drop two tips referencing the removed
  max_concurrency / extra_body knobs.
- website/docs/user-guide/configuration.md: drop 'Session Search Tuning'
  section and the auxiliary.session_search block from the example.
- website/docs/user-guide/features/fallback-providers.md: drop session_search
  rows from the auxiliary-tasks tables and the dedicated tuning subsection.
- website/docs/reference/tools-reference.md: rewrite the session_search
  entry to describe the new three-shape behaviour.
- CONTRIBUTING.md: update the file-tree description.
- tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone
  class and test_session_search_tool_guarded — both guard against an
  unguarded .content.strip() call site in _summarize_session() that no
  longer exists.

Validation: 97/97 targeted tests still pass (hermes_state + session_search +
llm_content_none_guard). Config tests 55/55.

---------

Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/memory Memory tool and memory providers type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants