`session_search` does not index `tool_calls` or `tool_name`

**Summary:** `messages_fts` is a virtual FTS5 table with a single `content` column whose external-content source is `messages.content`. The triggers populating it read only `new.content` / `old.content`. The `tool_calls` (TEXT, serialized JSON args) and `tool_name` columns on `messages` are not reached by the `search_messages` path, so tokens that only appear in those columns aren't found by `session_search` even when the row is in the DB. On `main`, `messages_fts_trigram` (added in schema v10) has the same single-column, content-only-trigger pattern.

This is not the CJK tokenizer issue tracked in #14829 (and #15500, closed as a duplicate of #14829). The repro below uses ASCII tokens, so the gap is at the schema and trigger layer, not in tokenization.

**Reproduction (clean clone of `main` HEAD `46b4cf8d`, schema v10; same behavior reproduces on `v0.11.0` `bf196a3fc`, schema v8):**

```python
import sys, tempfile
from pathlib import Path
sys.path.insert(0, "/path/to/hermes-agent")  # repo root of clean clone
import hermes_state

with tempfile.TemporaryDirectory() as td:
    db = hermes_state.SessionDB(db_path=Path(td) / "state.db")
    db.create_session(session_id="s1", source="cli")
    db.create_session(session_id="s2", source="cli")
    db.append_message("s1", role="assistant",
                      content="Uploading to BUCKETMARKER_CONTENT.")
    db.append_message("s2", role="assistant", content="",
        tool_calls=[{"id": "c1", "type": "function",
                     "function": {"name": "FUNCNAMEMARKER",
                                  "arguments": '{"cmd": "aws s3 cp x s3://BUCKETMARKER_TOOLCALL/"}'}}],
        tool_name="TOOLNAMEMARKER")

    print("via search_messages:")
    for tok in ("BUCKETMARKER_CONTENT", "BUCKETMARKER_TOOLCALL",
                "FUNCNAMEMARKER", "TOOLNAMEMARKER"):
        print(f"  {tok}: {len(db.search_messages(tok))} hit(s)")

    # Direct SQL check of persisted columns, not a public API path.
    print("via direct LIKE on messages table:")
    for tok in ("BUCKETMARKER_CONTENT", "BUCKETMARKER_TOOLCALL",
                "FUNCNAMEMARKER", "TOOLNAMEMARKER"):
        n = db._conn.execute(
            "SELECT COUNT(*) FROM messages "
            "WHERE content LIKE ? OR tool_calls LIKE ? OR tool_name LIKE ?",
            (f"%{tok}%", f"%{tok}%", f"%{tok}%"),
        ).fetchone()[0]
        print(f"  {tok}: {n} hit(s)")
```

**Expected** (assuming `session_search` is meant to cover searchable message context stored in the `messages` table): all four tokens find the relevant row.

**Actual** (identical on `v0.11.0` `bf196a3fc` and `main` HEAD `46b4cf8d`):

```
via search_messages:
  BUCKETMARKER_CONTENT:  1 hit(s)
  BUCKETMARKER_TOOLCALL: 0 hit(s)
  FUNCNAMEMARKER:        0 hit(s)
  TOOLNAMEMARKER:        0 hit(s)
via direct LIKE on messages table:
  BUCKETMARKER_CONTENT:  1 hit(s)
  BUCKETMARKER_TOOLCALL: 1 hit(s)
  FUNCNAMEMARKER:        1 hit(s)
  TOOLNAMEMARKER:        1 hit(s)
```

On `main` the trigram table was probed separately with `SELECT COUNT(*) FROM messages_fts_trigram WHERE messages_fts_trigram MATCH ?` for each token, with the same result `(1, 0, 0, 0)`. `search_messages` routes to the trigram table only for CJK queries with `cjk_count >= 3`, so non-CJK tokens like the ASCII markers above never hit either FTS table.

**Where the gap is** (`hermes_state.py`, identical on both refs; `main` also has `messages_fts_trigram` with the same single-column, content-only-trigger pattern):

```sql
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
    content,
    content=messages,
    content_rowid=id
);

CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
END;
-- _delete and _update triggers also read only old.content / new.content.
```

**User-facing symptom:** Incomplete recall. `session_search` returns nothing for a token the user knows was in a prior session, because the token only ever appeared in `tool_calls` or `tool_name`.

**Possible implementation directions.** A naive "concat `tool_calls` and `tool_name` into the existing FTS `content` column" approach breaks the external-content invariant: `content=messages` ties the FTS column to `messages.content`, so any divergence between indexed text and the source column desyncs FTS `rebuild` and `snippet()` / `highlight()`. Three sound alternatives:

- Add `tool_calls` and `tool_name` as additional FTS5 columns alongside `content`, with matching source columns so the external-content contract holds. Enables column-scoped queries.
- Add a denormalized `messages.search_text` column maintained from `content`, `tool_calls`, and `tool_name`, and have FTS mirror that column. Requires a backfill plus `INSERT INTO messages_fts(messages_fts) VALUES('rebuild')`.
- Switch to an internal-content or contentless FTS5 table whose indexed text the triggers manage directly. Trades off some `snippet()` / `highlight()` ergonomics.

I searched for prior work and didn't find an existing report on `tool_calls` / `tool_name` indexing. #15500 (closed as duplicate of open #14829) is about CJK tokenization and is unrelated to this gap.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`session_search` does not index `tool_calls` or `tool_name` #16751

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

session_search does not index tool_calls or tool_name #16751

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`session_search` does not index `tool_calls` or `tool_name` #16751