You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SystemSkillSyncService.cs:39 includes "netclaw" in the GenericKeywords blacklist. This strips the token "netclaw" from ALL skill keyword indexes, making it impossible for identity-related user queries to trigger skill auto-loading.
Impact
User message: "What version of Netclaw are we on right now"
Keyword matching against netclaw-manual:
"version" = 1.0 (keyword hit)
"netclaw" = 0 (blacklisted!)
Phrase "netclaw version" exists in index, but user's bigram is "of netclaw" (wrong order) — no match
Total score: 1.0 < threshold 2.5 → NOT LOADED
If "netclaw" weren't blacklisted, TF-IDF weighting would give it ~0.5 (appears in 3+ skills). Combined with "version" (1.0), total would be ~1.5 — still below 2.5, but adding phrase matching improvements could push it over.
Root Cause
The GenericKeywords set was designed to filter out low-discrimination tokens that appear everywhere. But "netclaw" IS the discriminating token for identity queries — and the TF-IDF weighting (GetTokenWeight) already handles common tokens by reducing their weight.
Compounding Issue: Stale Keyword Cache
The cached keyword file is for version 0.6.0 but the current skill is 0.8.2. The content hash won't match → cache miss → enrichment must re-run from LLM → race condition (#316) → no keywords available during the gap.
Old cache files are never cleaned up — they become orphans when skill versions change.
Fix
Remove "netclaw" from GenericKeywords — the TF-IDF weighting already handles it
Clean up stale keyword cache files during RescanAndUpdateIndex() — delete files whose version doesn't match current skill version
Problem
SystemSkillSyncService.cs:39includes"netclaw"in theGenericKeywordsblacklist. This strips the token "netclaw" from ALL skill keyword indexes, making it impossible for identity-related user queries to trigger skill auto-loading.Impact
User message: "What version of Netclaw are we on right now"
Keyword matching against
netclaw-manual:If "netclaw" weren't blacklisted, TF-IDF weighting would give it ~0.5 (appears in 3+ skills). Combined with "version" (1.0), total would be ~1.5 — still below 2.5, but adding phrase matching improvements could push it over.
Root Cause
The
GenericKeywordsset was designed to filter out low-discrimination tokens that appear everywhere. But "netclaw" IS the discriminating token for identity queries — and the TF-IDF weighting (GetTokenWeight) already handles common tokens by reducing their weight.Compounding Issue: Stale Keyword Cache
The cached keyword file is for version
0.6.0but the current skill is0.8.2. The content hash won't match → cache miss → enrichment must re-run from LLM → race condition (#316) → no keywords available during the gap.Old cache files are never cleaned up — they become orphans when skill versions change.
Fix
GenericKeywords— the TF-IDF weighting already handles itRescanAndUpdateIndex()— delete files whose version doesn't match current skill versionRelevant Code
SystemSkillSyncService.cs:34-40—GenericKeywordsblacklistSystemSkillSyncService.cs:558-605— keyword cache I/OSystemSkillSyncService.cs:316-351—RescanAndUpdateIndex()(no cache cleanup)SkillRegistry.cs:168-177—GetTokenWeightTF-IDF weighting