Skip to content

bug: "netclaw" blacklisted as generic keyword — prevents skill auto-loading on identity queries #328

@Aaronontheweb

Description

@Aaronontheweb

Problem

SystemSkillSyncService.cs:39 includes "netclaw" in the GenericKeywords blacklist. This strips the token "netclaw" from ALL skill keyword indexes, making it impossible for identity-related user queries to trigger skill auto-loading.

Impact

User message: "What version of Netclaw are we on right now"

Keyword matching against netclaw-manual:

  • "version" = 1.0 (keyword hit)
  • "netclaw" = 0 (blacklisted!)
  • Phrase "netclaw version" exists in index, but user's bigram is "of netclaw" (wrong order) — no match
  • Total score: 1.0 < threshold 2.5 → NOT LOADED

If "netclaw" weren't blacklisted, TF-IDF weighting would give it ~0.5 (appears in 3+ skills). Combined with "version" (1.0), total would be ~1.5 — still below 2.5, but adding phrase matching improvements could push it over.

Root Cause

The GenericKeywords set was designed to filter out low-discrimination tokens that appear everywhere. But "netclaw" IS the discriminating token for identity queries — and the TF-IDF weighting (GetTokenWeight) already handles common tokens by reducing their weight.

Compounding Issue: Stale Keyword Cache

The cached keyword file is for version 0.6.0 but the current skill is 0.8.2. The content hash won't match → cache miss → enrichment must re-run from LLM → race condition (#316) → no keywords available during the gap.

Old cache files are never cleaned up — they become orphans when skill versions change.

Fix

  1. Remove "netclaw" from GenericKeywords — the TF-IDF weighting already handles it
  2. Clean up stale keyword cache files during RescanAndUpdateIndex() — delete files whose version doesn't match current skill version
  3. Apply fallback keywords immediately when a cache miss occurs, then upgrade to LLM-enriched keywords when ready (addresses bug: skill enrichment race condition — early sessions miss keyword-based auto-loading #316)

Relevant Code

  • SystemSkillSyncService.cs:34-40GenericKeywords blacklist
  • SystemSkillSyncService.cs:558-605 — keyword cache I/O
  • SystemSkillSyncService.cs:316-351RescanAndUpdateIndex() (no cache cleanup)
  • SkillRegistry.cs:168-177GetTokenWeight TF-IDF weighting

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions