fix(state): honour HERMES_DISABLE_FTS_TRIGRAM to skip CJK index#42190
Open
bilawalriaz wants to merge 1 commit into
Open
fix(state): honour HERMES_DISABLE_FTS_TRIGRAM to skip CJK index#42190bilawalriaz wants to merge 1 commit into
bilawalriaz wants to merge 1 commit into
Conversation
The CJK trigram FTS5 index is currently created unconditionally when FTS5 is available, and on large histories it accounts for the majority of state.db size (~70% per NousResearch#22478). The docstring on optimize_fts already referenced an env-var opt-out, but no code path checked it. Introduce _trigram_fts_disabled() reading HERMES_DISABLE_FTS_TRIGRAM (accepts 1/true/yes/on, case-insensitive). Gate the v10 / v11 schema migrations, the normal-startup schema creation, _rebuild_fts_indexes, and _fts_trigger_count on the helper. Split _FTS_TRIGGERS into the porter and trigram halves so _drop_fts_triggers still cleans up stale trigram triggers for users who flip the var back on. search_messages() now falls through to the LIKE-based CJK path when the trigram table is absent, so non-CJK and short-CJK queries still work after disabling the index. Fixes NousResearch#22478
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Closes #22478.
The CJK trigram FTS5 index is currently created unconditionally when FTS5 is available. On large histories it accounts for the majority of
state.dbsize — the user-reported numbers in #22478 showmessages_fts_trigramalone at 247 MB (49% of a 505 MB DB), and the trigram index is 2.2× larger than the porter-stemmer index that indexes the same data.The docstring on
optimize_fts()already promised anHERMES_DISABLE_FTS_TRIGRAMopt-out, but no code path actually read the variable — it was dead documentation. This PR wires the gate up everywhere it matters.Changes
_trigram_fts_disabled()helper readingHERMES_DISABLE_FTS_TRIGRAM(accepts1/true/yes/on, case-insensitive, whitespace-trimmed)._FTS_TRIGGERSinto_PORTER_FTS_TRIGGERSand_TRIGRAM_FTS_TRIGGERS._drop_fts_triggersstill iterates the full union so a user who flips the var back on doesn't leave stale triggers on disk._rebuild_fts_indexes, and_fts_trigger_counton the helper.search_messages()now falls through to the LIKE-based CJK path when the trigram table is absent, so English / short-CJK / mixed queries still work after disabling the index.TestTrigramFtsDisabledclass with 7 tests covering: env-var truthy parsing, default behaviour, table non-creation, trigger non-creation, English search still functional, andoptimize_fts()returning 1 (porter only) instead of 2.How to test
To verify end-to-end space reclamation on a real DB:
Platforms tested
Notes
tests/agent/test_auxiliary_client.py(8 tests) is unrelated and reproduces onmainbefore this branch.import osadded tohermes_state.py.