feat: semantic skill retrieval with FTS5 + hybrid selector + skill enforcement by Cyrene963 · Pull Request #18316 · NousResearch/hermes-agent

Cyrene963 · 2026-05-01T09:09:29Z

Problem

Agents don't load relevant skills before acting. User says "帮我写12章小说" → agent writes without loading verified-long-form-writing. Previous attempts:

Prompt instructions alone → agent ignores them
Plugin-only enforcement → agent can bypass plugins
Keyword matching → language-specific, fragile, user explicitly rejected

Solution: 3-Layer Architecture

Layer 1: Hybrid Skill Selector (`agent/hybrid_skill_selector.py`)

Zero token cost, <50ms

Fast Rules (greetings) → skip
Task Patterns (regex)  → quick match for 17 task categories
FTS5 Semantic Search   → fallback for complex/unusual tasks

17 task categories: debugging, GitHub, research, writing, deployment, data analysis, image/media, email, PPT, notes, smart home, testing, AI/model, cron, skill management, system admin, code review.

Languages: Chinese + English + mixed. Example: "帮我写一篇12章的小说" → verified-long-form-writing ✅

Layer 2: Skill Evaluation Gate (`agent/skill_eval_gate.py`)

Code-level enforcement, zero keyword matching

Before the agent's first action tool call:

System prompt includes ALL skill names + descriptions (existing build_skills_system_prompt)
A mandatory instruction is injected: "You MUST evaluate the skill index and call skill_view() for relevant skills"
A pre_tool_call hook in run_agent.py BLOCKS any non-read tool until skill_view() is called
After skill_view() is called once, the gate opens for the session

Read-only tools (read_file, search_files, hindsight_recall, session_search, etc.) are NOT blocked — the agent can freely research before acting.

Layer 3: MiMo Execution Guidance

Anti-hallucination rules injected for MiMo models: verify before stating, no fabrication, multi-source cross-validation.

How It Works in Practice

User says: "帮我模拟未来十年人生，先调研我的背景信息"

1. Hybrid selector: "调研" matches research → research-workflows
2. System prompt: skill index + mandatory instruction injected
3. Agent researches: hindsight_recall, session_search, read_file → ALL work (read-only, no gate)
4. Agent tries write_file → GATE BLOCKS: "SKILL EVALUATION REQUIRED"
5. Agent evaluates skill index → loads verified-long-form-writing, deep-work
6. Gate opens → agent writes following skill rules (phases, quality gates, per-chapter files)

Cron job (every 3h):

New session → gate resets → agent must evaluate skills again → loads relevant skills → writes

Changes (6 commits, 7 files, +1097 lines)

File	What
`agent/skill_db.py` (new)	SQLite FTS5 database for semantic skill search
`agent/hybrid_skill_selector.py` (new)	3-layer hybrid selector (rules → patterns → FTS5)
`agent/skill_eval_gate.py` (new)	Gate logic + mandatory instruction text
`agent/prompt_builder.py`	Semantic retrieval mode (`build_skills_system_prompt_semantic`)
`run_agent.py`	Gate integration: state tracking, instruction injection, tool dispatch enforcement, per-conversation reset
`tools/skills_sync.py`	FTS5 index sync trigger
`tools/skills_tool.py`	Skill tool registration

Test Results

Gate enforcement:     16/16 ✅
Task matching:        25/25 ✅ (ZH + EN)
Edge cases:            5/5  ✅

Token Impact

Mode	Skills injected	Tokens/turn
Broadcast (current)	~121 (all)	~4,500
Semantic (FTS5)	~15 (relevant)	~200
Hybrid (this PR)	~1-3 (matched)	~100

Replaces feat(plugins): skill-router — auto-load skills on first action tool call #19492 (keyword-based plugin, closed)
Replaces feat: Skill Evaluation Gate — universal skill loading (no keywords) #19524 (standalone gate, merged into this PR)
Closes Feature: Semantic Skill Retrieval with SQLite FTS5 — Replace 4500-token broadcast with on-demand search #17649 (FTS5 skill retrieval feature request)

- MiMo models don't receive tool-use enforcement guidance (not in TOOL_USE_ENFORCEMENT_MODELS) - Skill system is 'advisory' not 'enforcement' - LLM can ignore loaded skills - No mechanism to verify response compliance with skill rules 1. Add 'mimo' to TOOL_USE_ENFORCEMENT_MODELS 2. Create MIMO_MODEL_EXECUTION_GUIDANCE for MiMo-specific enforcement 3. Add mandatory_skills config (skills that MUST be loaded before factual responses) 4. Add skill_enforcement config (verify responses against loaded skills) 5. Add _verify_skill_compliance() method for runtime verification - agent/prompt_builder.py: Add MiMo to enforcement models, create MIMO guidance - run_agent.py: Add config support, mandatory skills injection, compliance verification - MiMo models now receive enforcement guidance (same as GPT/Gemini) - Factual responses must verify from official sources - Mandatory skills are loaded before any factual response - Compliance verification catches violations before sending - PR NousResearch#18316 (semantic skill retrieval) - PR NousResearch#17380 (memory authority preservation) - Issue: MiMo hallucination without verification

…kpoints Addresses the 'having rules != following rules' problem where the agent has skills and memory loaded but fails to follow their rules during execution. The plugin hooks into pre_tool_call and triggers compliance checkpoints every N action tool calls. How it works: - Tracks 'action tool' calls per session (terminal, write_file, patch, browser_*, delegate_task, cronjob, execute_code, etc.) - Every 8 action calls, blocks with a COMPLIANCE CHECKPOINT message - Agent must call skill_view/hindsight_recall/session_search to acknowledge - Counter resets after acknowledgment - Non-action tools (read_file, search_files, web_search) don't count - Per-session isolation (different sessions tracked independently) Complements PR NousResearch#18316 (hybrid skill selector): - PR NousResearch#18316 = which skills to inject into system prompt - This plugin = periodic verification that injected rules are followed Configurable via _CHECKPOINT_INTERVAL constant (default: 8). Tested scenarios: - First 7 action tools pass, 8th triggers checkpoint - Acknowledgment resets counter, next 8 pass - read_file/search_files don't increment counter - Sessions are independently tracked

complete - agent/skill_db.py: SQLite FTS5 skill index (353 lines) - agent/skill_eval_gate.py: code-enforced skill evaluation gate - run_agent.py: skill_eval_done flag + gate instruction injection - Resolves conflict by keeping hybrid selector on top of FTS5 base

…kpoints Addresses the 'having rules != following rules' problem where the agent has skills and memory loaded but fails to follow their rules during execution. The plugin hooks into pre_tool_call and triggers compliance checkpoints every N action tool calls. How it works: - Tracks 'action tool' calls per session (terminal, write_file, patch, browser_*, delegate_task, cronjob, execute_code, etc.) - Every 8 action calls, blocks with a COMPLIANCE CHECKPOINT message - Agent must call skill_view/hindsight_recall/session_search to acknowledge - Counter resets after acknowledgment - Non-action tools (read_file, search_files, web_search) don't count - Per-session isolation (different sessions tracked independently) Complements PR NousResearch#18316 (hybrid skill selector): - PR NousResearch#18316 = which skills to inject into system prompt - This plugin = periodic verification that injected rules are followed Configurable via _CHECKPOINT_INTERVAL constant (default: 8). Tested scenarios: - First 7 action tools pass, 8th triggers checkpoint - Acknowledgment resets counter, next 8 pass - read_file/search_files don't increment counter - Sessions are independently tracked

Replace the broadcast-everything approach (injecting all ~140 skills every turn, ~4500 tokens) with FTS5 semantic search that injects only relevant skills (~15 skills, ~200 tokens). ## Changes 1. **agent/skill_db.py** (new) — SQLite FTS5 skill index module - SkillDB singleton with thread-safe connections - FTS5 virtual table for full-text search - Usage tracking (use_count, last_used) for popularity boosting - sync_skills() to index skills from filesystem - search() with FTS5 query + usage boosting 2. **agent/prompt_builder.py** — Added build_skills_system_prompt_semantic() - Searches FTS5 index for relevant skills based on user message - Combines top-K by usage (proven useful) + semantic matches - Falls back to broadcast if SkillDB is empty - Config: skills.retrieval = 'semantic' to enable 3. **tools/skills_sync.py** — Sync to SkillDB after file sync - Calls SkillDB.sync_skills() after updating skill files - Non-critical: failures logged but don't block sync 4. **tools/skills_tool.py** — Record usage in SkillDB - skill_view() now calls SkillDB.record_usage() for semantic ranking 5. **run_agent.py** — Config-driven retrieval mode - Reads skills.retrieval from config.yaml - Options: 'broadcast' (default, legacy) or 'semantic' ## Configuration ```yaml skills: retrieval: semantic # or 'broadcast' (default) top_k: 15 # max skills per turn (default 15) ``` ## Impact | Metric | Before (broadcast) | After (semantic) | Savings | |--------|-------------------|------------------|---------| | Skills injected | ~140 | ~10-15 | ~90% | | Tokens per turn | ~4500 | ~200 | ~95% | | Monthly cost (3k turns) | ~0 | ~ | ~8 | ## Related - Issue NousResearch#17649 — Feature request for semantic skill retrieval - Implements the FTS5 approach described in the issue

Implements three-layer hybrid skill selection: - Layer 1: Fast rules (0 token) - greetings, simple questions - Layer 2: Task patterns (0 token) - debug/github/system/research/etc. - Layer 3: AI inference (future) - complex tasks Integrates with prompt_builder.py build_skills_system_prompt_semantic(). Falls back to FTS5 when hybrid selection has no match. Token savings: 99.2% vs broadcast, 93.2% vs FTS5-only. Based on实测 data from 39 test conversations.

## Problem - MiMo models don't receive tool-use enforcement guidance (not in TOOL_USE_ENFORCEMENT_MODELS) - Skill system is 'advisory' not 'enforcement' - LLM can ignore loaded skills - No mechanism to verify response compliance with skill rules ## Solution 1. Add 'mimo' to TOOL_USE_ENFORCEMENT_MODELS 2. Create MIMO_MODEL_EXECUTION_GUIDANCE for MiMo-specific enforcement 3. Add mandatory_skills config (skills that MUST be loaded before factual responses) 4. Add skill_enforcement config (verify responses against loaded skills) 5. Add _verify_skill_compliance() method for runtime verification ## Configuration ## Files Changed - agent/prompt_builder.py: Add MiMo to enforcement models, create MIMO guidance - run_agent.py: Add config support, mandatory skills injection, compliance verification ## Impact - MiMo models now receive enforcement guidance (same as GPT/Gemini) - Factual responses must verify from official sources - Mandatory skills are loaded before any factual response - Compliance verification catches violations before sending ## Related - PR NousResearch#18316 (semantic skill retrieval) - PR NousResearch#17380 (memory authority preservation) - Issue: MiMo hallucination without verification

… keywords) Replace keyword-based skill auto-injection with a code-enforced gate that leverages the LLM's own semantic understanding to select relevant skills. How it works: 1. System prompt includes all skill names + descriptions (existing mechanism) 2. A mandatory instruction tells the agent to evaluate skills before acting 3. A pre_tool_call hook BLOCKS non-read tools until skill_view() is called 4. After skill_view() is called once, the gate opens for the rest of the session This is universal (works for any language), lightweight (no extra API calls), and code-enforced (agent cannot skip it). No keyword matching. Changes: - agent/skill_eval_gate.py: New module with gate logic and instruction text - run_agent.py: Integrate gate into system prompt + both tool dispatch paths

The hybrid selector's fast-path (Layer 2) was missing common task types: - Writing/content (小说, article, chapter, report) - Deployment (deploy, docker, server) - Data analysis (excel, csv, analysis) - Image/media (image, video, generate) This caused tasks like '帮我写一篇12章的小说' to fall through to FTS5 instead of immediately matching verified-long-form-writing.

Added missing task patterns: - Email (邮件, email, gmail, 写邮件) - Presentation/PPT (ppt, 演示, 幻灯片) - Note-taking (笔记, obsidian, notion) - Smart home (智能家居, hue) - Code review (review, code review) - Cron/scheduled tasks (cron, 定时, scheduled) Fixed pattern ordering: email before writing (写邮件→email not writing) Fixed false positive: \btest\b instead of test (prevents matching 'search') Split image/media into specific tool patterns vs generic terms Test results: 25/25 task matching + 5/5 edge cases

…ilder Tracks actual skill injection count per message for token savings measurement: - method=skip (simple messages, 0 skills) - method=task_pattern (rule-matched skills) - method=ai_inference (AI-selected skills) - method=fts5 (FTS5 search results) - method=broadcast (all skills fallback) Each log line: SKILL_STATS: method=<method> count=<N> Parse with: grep SKILL_STATS agent.log | awk -F'count=' '{print }'

Cyrene963 · 2026-05-15T11:09:32Z

Closing this PR — the functionality has already been implemented locally in our fork's patch stack (local patch applied). The local implementation covers the same scope and has been running in production.

Thanks for the contribution! If upstream merges similar functionality, we'll rebase our patches accordingly.

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder tool/skills Skills system (list, view, manage) labels May 1, 2026

Cyrene963 changed the title ~~feat: Semantic skill retrieval with SQLite FTS5~~ feat: Hybrid skill retrieval with 99.2% token savings May 1, 2026

Cyrene963 mentioned this pull request May 2, 2026

feat(plugins): add skill-enforcer plugin for periodic compliance checkpoints #18849

Closed

Cyrene963 changed the title ~~feat: Hybrid skill retrieval with 99.2% token savings~~ feat: semantic skill retrieval with FTS5 + hybrid selector + skill enforcement May 2, 2026

Cyrene963 mentioned this pull request May 3, 2026

feat: auto-context retrieval — automatic session search + system message injection #19200

Closed

Cyrene963 force-pushed the feature/semantic-skill-retrieval branch from 2f392ed to cd8ef34 Compare May 4, 2026 04:56

This was referenced May 4, 2026

feat: Skill Evaluation Gate — universal skill loading (no keywords) #19524

Closed

feat(plugins): skill-router — auto-load skills on first action tool call #19492

Closed

Cyrene963 force-pushed the feature/semantic-skill-retrieval branch from 97a355e to 0250641 Compare May 7, 2026 13:37

Nitrogen and others added 7 commits May 7, 2026 22:54

Cyrene963 force-pushed the feature/semantic-skill-retrieval branch from 0250641 to b546348 Compare May 7, 2026 14:54

Cyrene963 mentioned this pull request May 7, 2026

fix: restore skill_eval_gate + pre_select_skills universal injection #21363

Closed

Cyrene963 closed this May 15, 2026

alt-glitch mentioned this pull request May 29, 2026

Feature Request: Semantic / Per-Message Skill Retrieval #34823

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: semantic skill retrieval with FTS5 + hybrid selector + skill enforcement#18316

feat: semantic skill retrieval with FTS5 + hybrid selector + skill enforcement#18316
Cyrene963 wants to merge 7 commits into
NousResearch:mainfrom
Cyrene963:feature/semantic-skill-retrieval

Cyrene963 commented May 1, 2026 •

edited

Loading

Uh oh!

Cyrene963 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Cyrene963 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution: 3-Layer Architecture

Layer 1: Hybrid Skill Selector (agent/hybrid_skill_selector.py)

Layer 2: Skill Evaluation Gate (agent/skill_eval_gate.py)

Layer 3: MiMo Execution Guidance

How It Works in Practice

Changes (6 commits, 7 files, +1097 lines)

Test Results

Token Impact

Related

Uh oh!

Cyrene963 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cyrene963 commented May 1, 2026 •

edited

Loading

Layer 1: Hybrid Skill Selector (`agent/hybrid_skill_selector.py`)

Layer 2: Skill Evaluation Gate (`agent/skill_eval_gate.py`)