Skip to content

fix(session-search): strip FTS5 operators from truncation query terms#18692

Closed
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/session-search-fts5-operators
Closed

fix(session-search): strip FTS5 operators from truncation query terms#18692
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/session-search-fts5-operators

Conversation

@liuhao1024

@liuhao1024 liuhao1024 commented May 2, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

_truncate_around_matches() treats FTS5 boolean operators (AND, OR, NOT) as content terms when splitting the query. Since common English words like "and" appear in virtually every conversation, this produces false match positions and mis-centered truncation windows.

Similarly, FTS5 syntax like NEAR(...), column filters (role:user), and special characters (+, *, ^) pollute the search terms.

Related Issue

Fixes #4238

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • See commit messages for detailed changes

How to Test

  1. Run pytest tests/ -q — all tests should pass
  2. Verify the specific scenario described above is resolved

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 26.4.1

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture and workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A

When _truncate_around_matches receives an FTS5 query with boolean
operators (AND, OR, NOT), those operators were split into individual
terms and searched for in the conversation text.  Since common English
words like 'and' appear everywhere, this produced noisy match positions
and mis-centered truncation windows.

Add _strip_fts5_operators() helper that removes:
- Boolean operators (AND, OR, NOT) — case-insensitive
- NEAR(...) clauses
- Column filters (e.g. role:user)
- FTS5 special characters (+, {}, (), ^, ~, *)
- Quoted-phrase delimiters (content preserved)

The cleaned query is used for all three matching strategies (phrase,
proximity co-occurrence, individual terms).

Fixes NousResearch#4238, NousResearch#4239
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/tools Tool registry, model_tools, toolsets labels May 2, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Note: This PR appears to be a superset of #18690 (same fix scope + more). Both target #4238; this one also covers #4239 with NEAR(), column filters, and special char stripping.

Cyrene963 pushed a commit to Cyrene963/hermes-agent that referenced this pull request May 3, 2026
Community PRs applied:
- NousResearch#18596: Enable secret redaction by default (SECURITY)
- NousResearch#18650: Sanitize malformed tool messages + auto-recover on API 400
- NousResearch#18607: Emergency compression before max_iterations exhaustion
- NousResearch#18603: Compression fallback to main model on 413 rate limit
- NousResearch#18638: Pass threshold_percent on model switch
- NousResearch#18663: Strip extra_content from tool_calls for strict APIs
- NousResearch#18618: Forward explicit_api_key to OpenRouter
- NousResearch#18632: Show cache tokens in /insights breakdown
- NousResearch#18614: Add idempotency guard for patch duplicate loops
- NousResearch#18600: Raise ValueError when HERMES_HOME unset in profile mode
- NousResearch#18616: Allow ZWJ emoji in context files
- NousResearch#18582: Reload .env on /restart
- NousResearch#18547: Stabilize system prompt prefix for KV cache reuse
- NousResearch#18692: Strip FTS5 operators from session search truncation terms

Fix: Add order_by_last_active=True to list_sessions_rich call
(pre-existing commit 142b4bf code sync)
@teknium1

Copy link
Copy Markdown
Contributor

This looks implemented on current main by the session_search redesign. Automated hermes-sweeper review.

Evidence:

  • The old _truncate_around_matches() / summary-truncation path that PR fix(session-search): strip FTS5 operators from truncation query terms #18692 patched is no longer present in tools/session_search_tool.py on main.
  • tools/session_search_tool.py:21 documents the current DB-backed behavior, and tools/session_search_tool.py:23 says session_search returns actual messages with no LLM calls.
  • Discovery now runs FTS5 via db.search_messages(...) at tools/session_search_tool.py:405, keeps the matched row id as msg_id, and calls db.get_anchored_view(hit_sid, msg_id, ...) at tools/session_search_tool.py:453.
  • The returned payload exposes match_message_id at tools/session_search_tool.py:472 and marks that id as the anchor in the returned message window at tools/session_search_tool.py:475.
  • hermes_state.py:2448 implements get_messages_around(...) as an anchored message-id window, and its docstring notes it is used by session_search discovery anchored on the FTS5 match.
  • tests/tools/test_session_search.py:169 covers the invariant that every discovery result's match_message_id appears in the returned window.
  • Best proving commit: abf1af540193c30047ff3e7e759c330faf3a880f (feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590)), contained in v2026.5.28.

Thanks for the original fix; the later redesign removed the code path it targeted while preserving the intended behavior.

@teknium1 teknium1 closed this Jun 10, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session search truncation treats FTS5 boolean operators as search terms

3 participants