fix(state): index tool_name and tool_calls in FTS5, with repair + migration (salvages #16866)#16914
Merged
Conversation
The FTS5 virtual tables (messages_fts, messages_fts_trigram) previously only indexed the content column via external content mode. Tool calls and tool names stored in the tool_calls (JSON) and tool_name columns were invisible to FTS5 search. Root cause: FTS5 triggers only INSERTed new.content into the index. Changes: - Switch FTS5 tables from external content (content=messages) to inline mode so that trigger-inserted content is both indexed and stored - Update all 6 FTS5 triggers to concatenate content, tool_name, and tool_calls when indexing new messages - Extend the short-CJK LIKE fallback to also search tool_name and tool_calls columns Closes: #16751
…-call indexing Follow-up on top of the cherry-picked contributor commit for #16751: 1. Delete triggers: the original PR switched FTS5 from external to inline content mode and concatenated content || tool_name || tool_calls in the insert/update triggers, but left the delete triggers passing old.content to the FTS5 delete-command. FTS5 inline delete requires the content to match what was stored, so every DELETE on messages raised 'SQL logic error'. Replaced with plain DELETE FROM ... WHERE rowid = old.id on all four delete paths (normal + trigram, delete + update-delete). 2. v11 migration: existing DBs have the old external-content FTS tables and triggers. Because CREATE VIRTUAL TABLE IF NOT EXISTS / CREATE TRIGGER IF NOT EXISTS skip when the objects already exist, upgraders would have kept the broken behavior forever. Bumped SCHEMA_VERSION to 11 and added a migration that drops both FTS tables + all 6 old triggers, recreates them via FTS_SQL / FTS_TRIGRAM_SQL, and backfills from messages using the same concatenation expression. 3. Regression tests: 6 new tests cover INSERT / UPDATE / DELETE paths for tool_name + tool_calls indexing plus the full v10 -> v11 upgrade path on a hand-built legacy DB.
4 tasks
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Salvages @yes999zc's work on #16866 and fixes two gaps that blocked merging.
Summary
session_searchnow seestool_nameandtool_calls. Closes #16751.Changes
messages_fts,messages_fts_trigram) from external-content to inline mode and concatenatescontent || tool_name || tool_callsin the insert/update triggers; widens the short-CJK LIKE fallback to covertool_nameandtool_calls.old.contentto FTS5's delete-command. With inline mode, FTS5 requires the content to match what was actually stored (the concatenation), so everyDELETE FROM messagesraisedsqlite3.OperationalError: SQL logic error. Replaced with plainDELETE FROM messages_fts WHERE rowid = old.idon all four paths (regular + trigram, delete + update-delete).CREATE VIRTUAL TABLE IF NOT EXISTSandCREATE TRIGGER IF NOT EXISTSskip when the objects already exist, upgraders would have kept the broken behavior forever. BumpedSCHEMA_VERSIONto 11 and added a migration that drops both FTS tables + all 6 old triggers, recreates them, and backfills frommessageswith the same concatenation expression.Validation
web_searchon session with that tool_calltool_callsJSON argstool_namecolumnDELETE FROM messagesscripts/run_tests.sh tests/test_hermes_state.py tests/tools/test_session_search.py tests/acp/test_session.py: 264/264 passing.E2E script simulated a v10 DB with rows containing only tool-call tokens, opened via
SessionDB(migration runs), confirmed the previously-invisible tokens are now found, new inserts index correctly, and DELETE no longer raises.