fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751) by briandevans · Pull Request #16770 · NousResearch/hermes-agent

briandevans · 2026-04-28T02:16:08Z

Summary

Extend messages_fts and messages_fts_trigram with the tool_calls and tool_name columns from messages so session_search finds tokens that only appear in serialized tool-call args or tool names.
Add v11 migration that drops + recreates both FTS tables and backfills from messages.
Update triggers and the snippet column index (iCol = -1) to keep the snippet centered on the actual hit.
Mirror the broadened column set in the short-CJK LIKE fallback so all query paths share the same searchable surface.

The bug (#16751)

messages_fts is an external-content FTS5 table with a single content column. The triggers only insert new.content, so any token that lives in messages.tool_calls (TEXT JSON of function name + arguments) or messages.tool_name is never indexed. messages_fts_trigram (added in v10 for CJK substring search) has the same single-column shape.

The reporter showed that even with ASCII tokens, db.search_messages("FUNCNAMEMARKER") returns 0 hits when the marker only appears inside a tool call — although the row is in the DB. This is not the CJK tokenizer issue tracked in #14829 / #15500; the gap is at the schema/trigger layer.

The fix

External-content FTS5 columns are fixed at CREATE time, so this requires drop-and-recreate plus a backfill, not an ALTER TABLE.

FTS_SQL and FTS_TRIGRAM_SQL now declare three columns (content, tool_calls, tool_name) matching the source columns in messages. INSERT/DELETE/UPDATE triggers populate all three.
_init_schema gains a v11 step that drops the old triggers + tables, recreates them via the updated SQL, and backfills with INSERT INTO messages_fts(rowid, content, tool_calls, tool_name) SELECT id, content, tool_calls, tool_name FROM messages (no content IS NOT NULL filter so tool-only assistant messages with empty content are also indexed).
snippet(messages_fts, 0, …) → snippet(messages_fts, -1, …) (and the same for the trigram path). Per FTS5 docs, iCol = -1 lets the function pick the column with the highest score, so a hit inside tool_calls produces a snippet of the JSON args instead of an empty/wrong slice of content. Content-only hits still pick column 0 (no behavior change for the common path).
The 1–2 char CJK LIKE fallback now ORs the same three columns so the short-query path stays consistent with the FTS path.

Test plan

New TestFTS5SearchToolCallsAndToolName regression suite (tests/test_hermes_state.py):
- token only in tool-call function name → found
- token only in tool-call arguments → found
- token only in tool_name → found
- content-only match still works (invariant preserved)
- role_filter still applies to tool-call matches
- v11 migration: hand-built v10 DB with tool_calls/tool_name rows is searchable after SessionDB(...) opens it
Full tests/test_hermes_state.py (230 tests): 230 passed
Adjacent suites that consume SessionDB: tests/tools/test_session_search.py, tests/hermes_state/, tests/gateway/test_resume_command.py, tests/plugins/memory/test_hindsight_provider.py — 328 passed total
Regression guard: stashed hermes_state.py, reran the new tests, 5/6 fail with the expected "0 hits" symptom; restored, 6/6 pass

Contract protected

Invariant: every token persisted in messages.content, messages.tool_calls, or messages.tool_name is reachable through SessionDB.search_messages(...) for the matching row.
Known-bad inputs (now covered): assistant messages whose content is empty but whose tool-call function name or argument JSON contains the search token; tool-result messages whose tool_name is the only place the token appears.
Future-input coverage: the column list is keyed off the messages schema, so adding new tool-related text columns in the future only requires extending the FTS column list + triggers (the migration pattern is already in place).
Negative case: test_role_filter_applies_to_tool_call_matches — even with the broader index, role_filter=["user"] still excludes assistant tool-call rows. (Existing test_search_special_chars_do_not_crash, test_search_quoted_phrase_preserved, etc. still pass — the FTS5 query syntax surface is unchanged.)

Fixes session_search does not index tool_calls or tool_name #16751
Out of scope (and unchanged): FTS5 unicode61 tokenizer silently drops CJK characters, LIKE fallback only triggers on zero results #14829 / [BUG] session_search returns zero/partial results despite session files containing matching content #15500 (CJK tokenizer behavior on content)

Copilot

Pull request overview

Extends SessionDB.search_messages() coverage so session search can find tokens that appear only in messages.tool_calls (serialized tool-call JSON) or messages.tool_name, by widening both FTS5 tables and aligning the CJK short-query fallback.

Changes:

Bump state DB schema to v11 and recreate/backfill messages_fts + messages_fts_trigram with content, tool_calls, and tool_name columns.
Update FTS triggers to keep the widened index in sync and adjust snippet selection to center on the matched column (iCol = -1).
Add regression tests covering tool-call/tool-name indexing plus an in-place v10→v11 upgrade scenario; update schema-version assertions to 11.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`hermes_state.py`	Implements schema v11 migration, expands FTS schemas/triggers to index `tool_calls`/`tool_name`, updates snippet column selection, and broadens the short-CJK LIKE fallback to search the same surface.
`tests/test_hermes_state.py`	Adds regression tests for tool-call/tool-name search and v11 migration backfill; updates expected schema version to 11.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T02:19:47Z

+                like_pattern = f"%{escaped}%"
+                like_where = [
+                    "(m.content LIKE ? ESCAPE '\\' "
+                    "OR m.tool_calls LIKE ? ESCAPE '\\' "
+                    "OR m.tool_name LIKE ? ESCAPE '\\')"
+                ]
+                like_params: list = [like_pattern, like_pattern, like_pattern]


The short-CJK LIKE fallback now matches against m.tool_calls/m.tool_name, but the selected snippet is still derived solely from m.content (and uses instr(m.content, ...)). For tool-only matches (common for assistant tool-call rows where content may be NULL/empty), this yields an empty/NULL snippet even though the row matched, and can propagate NULL into the returned snippet field. Consider deriving the snippet from the column that matched (or at least COALESCE(m.content,'') in instr/substr, and/or a COALESCE(m.content, m.tool_calls, m.tool_name, '') search text) so tool-call matches produce a meaningful, non-NULL snippet in this code path.

briandevans · 2026-04-28T02:37:49Z

@copilot Finding addressed in commit a3497552a:

Finding (hermes_state.py:1562 — LIKE-fallback snippet derived from m.content only):
Replaced the single-column substr(m.content, instr(m.content, ?), …) with a CASE expression that picks the snippet from whichever of content / tool_calls / tool_name actually contains the match — keeping the cascade order matched to the WHERE clause and falling through to ' for degenerate input. Six positional binds (one per instr condition + matching instr inside substr) feed the case so the existing like_params = [raw_query] * 6 + like_params construction stays self-contained.

Added test_cjk_like_fallback_snippet_pulls_from_tool_name_when_content_empty covering the realistic manifestation: a 1-char CJK tool_name on a row with empty content. (tool_calls JSON goes through json.dumps(ensure_ascii=True) by default, so CJK lands as \uXXXX escapes and never matches a raw-CJK query — the only realistic CJK + LIKE-fallback path is tool_name.) Regression guard verified locally: the new test fails with assert "" against the prior code and passes with the fix; all 23 tests in TestCJKSearchFallback + TestFTS5SearchToolCallsAndToolName continue to pass.

…ousResearch#16751) `messages_fts` and `messages_fts_trigram` had a single `content` column, so tokens that only appeared in `messages.tool_calls` (serialized JSON args) or `messages.tool_name` were never indexed. Searching for a function name or tool argument returned 0 hits even though the row was in the DB — this is not the CJK tokenizer issue tracked in NousResearch#14829, the gap is at the schema and trigger layer. Why the existing fix didn't already cover this: * External-content FTS5 columns are fixed at CREATE time. With `content=messages, content_rowid=id`, the FTS table's column list has to match column names in `messages` — adding indexed columns means a drop-and-recreate plus backfill, not an ALTER. * The triggers only inserted `new.content`, so even rows with non-empty `tool_calls` / `tool_name` produced empty FTS entries. Fix: * Extend both FTS5 tables (unicode61 and trigram) with `tool_calls` and `tool_name` columns. * Update insert/delete/update triggers to populate all three columns. * Add v11 migration that drops + recreates both tables, then backfills every message (no `WHERE content IS NOT NULL` filter so tool-only assistant messages with empty content are also indexed). * Auto-pick the matching column for `snippet()` via `iCol = -1` so the snippet centers on the actual hit instead of always showing `content`. * For the short-CJK (1–2 char) LIKE fallback, broaden the `WHERE` to match content / tool_calls / tool_name so the short-query path stays consistent with the FTS path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…l_name Address Copilot review on PR NousResearch#16770: The short-CJK LIKE fallback (1–2 char queries that the trigram FTS5 table can't match) was extended in NousResearch#16751 to also match ``m.tool_calls`` / ``m.tool_name``, but the ``snippet`` column was still derived solely from ``substr(m.content, instr(m.content, ?), …)``. For tool-only matches — common on assistant tool-call rows where ``content`` is NULL/empty and only ``tool_name`` carries the query token — the snippet came back empty, leaving session_search results without any usable preview even though the row matched. Replace the single-column ``substr`` with a ``CASE`` expression that picks the snippet from whichever of content / tool_calls / tool_name actually contains the match. The condition order mirrors how the existing match-source preference cascades (content first, then tool_calls, then tool_name, with an empty-string fallback that mirrors the existing FTS path's behavior on degenerate input). Add ``test_cjk_like_fallback_snippet_pulls_from_tool_name_when_content_empty`` to ``TestCJKSearchFallback`` covering the realistic scenario: a 1-char CJK ``tool_name`` (``tool_name`` is a plain TEXT column, not JSON- escaped, so CJK survives the round-trip unlike serialized ``tool_calls`` JSON which goes through ``json.dumps(ensure_ascii=True)``). Regression guard verified: without the snippet fix the new test fails with ``assert ''``; with the fix it returns a snippet containing the matched character. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

briandevans · 2026-04-28T10:08:50Z

Closing — superseded by @teknium1's #16914 (salvaging @yes999zc's #16866), which landed first and is the better fix here.

Coverage comparison:

Root cause: Both fixes switch the FTS5 tables from external-content to inline mode and index content || tool_name || tool_calls in the insert/update triggers. Same root-cause coverage.
Migration / repair: fix(state): index tool_name and tool_calls in FTS5, with repair + migration (salvages #16866) #16914 includes a v11 migration plus a repair path for FTS rows that were already written under the old single-column schema, plus the delete-trigger regression fix. This PR did not include either, so fix(state): index tool_name and tool_calls in FTS5, with repair + migration (salvages #16866) #16914 is strictly more complete on existing-DB upgrades.
LIKE-fallback coverage: Both widen the short-CJK LIKE fallback to cover tool_calls and tool_name.

No remaining gap that warrants a follow-up. Thanks @yes999zc and @teknium1.

Copilot AI review requested due to automatic review settings April 28, 2026 02:16

Copilot started reviewing on behalf of briandevans April 28, 2026 02:16 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 28, 2026

briandevans and others added 2 commits April 27, 2026 21:07

briandevans force-pushed the fix/session-search-index-tool-calls-16751 branch from a349755 to 0006781 Compare April 28, 2026 04:08

alt-glitch mentioned this pull request Apr 28, 2026

fix(state): index tool_calls and tool_name in FTS5 for session_search #16866

Closed

This was referenced Apr 28, 2026

fix(copilot): fall back to credential_pool OAuth access_token for /model picker (#16708) #16868

Closed

fix(credential-pool): pool rotation should not count toward api_max_retries (#16830) #16902

Closed

briandevans closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751)#16770

fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751)#16770
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/session-search-index-tool-calls-16751

briandevans commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

briandevans commented Apr 28, 2026

Uh oh!

briandevans commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

briandevans commented Apr 28, 2026

Summary

The bug (#16751)

The fix

Test plan

Contract protected

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

briandevans commented Apr 28, 2026

Uh oh!

briandevans commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants