feat: semantic skill retrieval with FTS5 + hybrid selector + skill enforcement#18316
Closed
Cyrene963 wants to merge 7 commits into
Closed
feat: semantic skill retrieval with FTS5 + hybrid selector + skill enforcement#18316Cyrene963 wants to merge 7 commits into
Cyrene963 wants to merge 7 commits into
Conversation
Cyrene963
pushed a commit
to Cyrene963/hermes-agent
that referenced
this pull request
May 3, 2026
- MiMo models don't receive tool-use enforcement guidance (not in TOOL_USE_ENFORCEMENT_MODELS) - Skill system is 'advisory' not 'enforcement' - LLM can ignore loaded skills - No mechanism to verify response compliance with skill rules 1. Add 'mimo' to TOOL_USE_ENFORCEMENT_MODELS 2. Create MIMO_MODEL_EXECUTION_GUIDANCE for MiMo-specific enforcement 3. Add mandatory_skills config (skills that MUST be loaded before factual responses) 4. Add skill_enforcement config (verify responses against loaded skills) 5. Add _verify_skill_compliance() method for runtime verification - agent/prompt_builder.py: Add MiMo to enforcement models, create MIMO guidance - run_agent.py: Add config support, mandatory skills injection, compliance verification - MiMo models now receive enforcement guidance (same as GPT/Gemini) - Factual responses must verify from official sources - Mandatory skills are loaded before any factual response - Compliance verification catches violations before sending - PR NousResearch#18316 (semantic skill retrieval) - PR NousResearch#17380 (memory authority preservation) - Issue: MiMo hallucination without verification
Cyrene963
pushed a commit
to Cyrene963/hermes-agent
that referenced
this pull request
May 3, 2026
…kpoints Addresses the 'having rules != following rules' problem where the agent has skills and memory loaded but fails to follow their rules during execution. The plugin hooks into pre_tool_call and triggers compliance checkpoints every N action tool calls. How it works: - Tracks 'action tool' calls per session (terminal, write_file, patch, browser_*, delegate_task, cronjob, execute_code, etc.) - Every 8 action calls, blocks with a COMPLIANCE CHECKPOINT message - Agent must call skill_view/hindsight_recall/session_search to acknowledge - Counter resets after acknowledgment - Non-action tools (read_file, search_files, web_search) don't count - Per-session isolation (different sessions tracked independently) Complements PR NousResearch#18316 (hybrid skill selector): - PR NousResearch#18316 = which skills to inject into system prompt - This plugin = periodic verification that injected rules are followed Configurable via _CHECKPOINT_INTERVAL constant (default: 8). Tested scenarios: - First 7 action tools pass, 8th triggers checkpoint - Acknowledgment resets counter, next 8 pass - read_file/search_files don't increment counter - Sessions are independently tracked
2f392ed to
cd8ef34
Compare
This was referenced May 4, 2026
97a355e to
0250641
Compare
Cyrene963
pushed a commit
to Cyrene963/hermes-agent
that referenced
this pull request
May 7, 2026
…kpoints Addresses the 'having rules != following rules' problem where the agent has skills and memory loaded but fails to follow their rules during execution. The plugin hooks into pre_tool_call and triggers compliance checkpoints every N action tool calls. How it works: - Tracks 'action tool' calls per session (terminal, write_file, patch, browser_*, delegate_task, cronjob, execute_code, etc.) - Every 8 action calls, blocks with a COMPLIANCE CHECKPOINT message - Agent must call skill_view/hindsight_recall/session_search to acknowledge - Counter resets after acknowledgment - Non-action tools (read_file, search_files, web_search) don't count - Per-session isolation (different sessions tracked independently) Complements PR NousResearch#18316 (hybrid skill selector): - PR NousResearch#18316 = which skills to inject into system prompt - This plugin = periodic verification that injected rules are followed Configurable via _CHECKPOINT_INTERVAL constant (default: 8). Tested scenarios: - First 7 action tools pass, 8th triggers checkpoint - Acknowledgment resets counter, next 8 pass - read_file/search_files don't increment counter - Sessions are independently tracked
Replace the broadcast-everything approach (injecting all ~140 skills
every turn, ~4500 tokens) with FTS5 semantic search that injects only
relevant skills (~15 skills, ~200 tokens).
## Changes
1. **agent/skill_db.py** (new) — SQLite FTS5 skill index module
- SkillDB singleton with thread-safe connections
- FTS5 virtual table for full-text search
- Usage tracking (use_count, last_used) for popularity boosting
- sync_skills() to index skills from filesystem
- search() with FTS5 query + usage boosting
2. **agent/prompt_builder.py** — Added build_skills_system_prompt_semantic()
- Searches FTS5 index for relevant skills based on user message
- Combines top-K by usage (proven useful) + semantic matches
- Falls back to broadcast if SkillDB is empty
- Config: skills.retrieval = 'semantic' to enable
3. **tools/skills_sync.py** — Sync to SkillDB after file sync
- Calls SkillDB.sync_skills() after updating skill files
- Non-critical: failures logged but don't block sync
4. **tools/skills_tool.py** — Record usage in SkillDB
- skill_view() now calls SkillDB.record_usage() for semantic ranking
5. **run_agent.py** — Config-driven retrieval mode
- Reads skills.retrieval from config.yaml
- Options: 'broadcast' (default, legacy) or 'semantic'
## Configuration
```yaml
skills:
retrieval: semantic # or 'broadcast' (default)
top_k: 15 # max skills per turn (default 15)
```
## Impact
| Metric | Before (broadcast) | After (semantic) | Savings |
|--------|-------------------|------------------|---------|
| Skills injected | ~140 | ~10-15 | ~90% |
| Tokens per turn | ~4500 | ~200 | ~95% |
| Monthly cost (3k turns) | ~0 | ~ | ~8 |
## Related
- Issue NousResearch#17649 — Feature request for semantic skill retrieval
- Implements the FTS5 approach described in the issue
Implements three-layer hybrid skill selection: - Layer 1: Fast rules (0 token) - greetings, simple questions - Layer 2: Task patterns (0 token) - debug/github/system/research/etc. - Layer 3: AI inference (future) - complex tasks Integrates with prompt_builder.py build_skills_system_prompt_semantic(). Falls back to FTS5 when hybrid selection has no match. Token savings: 99.2% vs broadcast, 93.2% vs FTS5-only. Based on实测 data from 39 test conversations.
## Problem - MiMo models don't receive tool-use enforcement guidance (not in TOOL_USE_ENFORCEMENT_MODELS) - Skill system is 'advisory' not 'enforcement' - LLM can ignore loaded skills - No mechanism to verify response compliance with skill rules ## Solution 1. Add 'mimo' to TOOL_USE_ENFORCEMENT_MODELS 2. Create MIMO_MODEL_EXECUTION_GUIDANCE for MiMo-specific enforcement 3. Add mandatory_skills config (skills that MUST be loaded before factual responses) 4. Add skill_enforcement config (verify responses against loaded skills) 5. Add _verify_skill_compliance() method for runtime verification ## Configuration ## Files Changed - agent/prompt_builder.py: Add MiMo to enforcement models, create MIMO guidance - run_agent.py: Add config support, mandatory skills injection, compliance verification ## Impact - MiMo models now receive enforcement guidance (same as GPT/Gemini) - Factual responses must verify from official sources - Mandatory skills are loaded before any factual response - Compliance verification catches violations before sending ## Related - PR NousResearch#18316 (semantic skill retrieval) - PR NousResearch#17380 (memory authority preservation) - Issue: MiMo hallucination without verification
… keywords) Replace keyword-based skill auto-injection with a code-enforced gate that leverages the LLM's own semantic understanding to select relevant skills. How it works: 1. System prompt includes all skill names + descriptions (existing mechanism) 2. A mandatory instruction tells the agent to evaluate skills before acting 3. A pre_tool_call hook BLOCKS non-read tools until skill_view() is called 4. After skill_view() is called once, the gate opens for the rest of the session This is universal (works for any language), lightweight (no extra API calls), and code-enforced (agent cannot skip it). No keyword matching. Changes: - agent/skill_eval_gate.py: New module with gate logic and instruction text - run_agent.py: Integrate gate into system prompt + both tool dispatch paths
The hybrid selector's fast-path (Layer 2) was missing common task types: - Writing/content (小说, article, chapter, report) - Deployment (deploy, docker, server) - Data analysis (excel, csv, analysis) - Image/media (image, video, generate) This caused tasks like '帮我写一篇12章的小说' to fall through to FTS5 instead of immediately matching verified-long-form-writing.
Added missing task patterns: - Email (邮件, email, gmail, 写邮件) - Presentation/PPT (ppt, 演示, 幻灯片) - Note-taking (笔记, obsidian, notion) - Smart home (智能家居, hue) - Code review (review, code review) - Cron/scheduled tasks (cron, 定时, scheduled) Fixed pattern ordering: email before writing (写邮件→email not writing) Fixed false positive: \btest\b instead of test (prevents matching 'search') Split image/media into specific tool patterns vs generic terms Test results: 25/25 task matching + 5/5 edge cases
…ilder
Tracks actual skill injection count per message for token savings measurement:
- method=skip (simple messages, 0 skills)
- method=task_pattern (rule-matched skills)
- method=ai_inference (AI-selected skills)
- method=fts5 (FTS5 search results)
- method=broadcast (all skills fallback)
Each log line: SKILL_STATS: method=<method> count=<N>
Parse with: grep SKILL_STATS agent.log | awk -F'count=' '{print }'
0250641 to
b546348
Compare
Author
|
Closing this PR — the functionality has already been implemented locally in our fork's patch stack (local patch applied). The local implementation covers the same scope and has been running in production. Thanks for the contribution! If upstream merges similar functionality, we'll rebase our patches accordingly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Agents don't load relevant skills before acting. User says "帮我写12章小说" → agent writes without loading
verified-long-form-writing. Previous attempts:Solution: 3-Layer Architecture
Layer 1: Hybrid Skill Selector (
agent/hybrid_skill_selector.py)Zero token cost, <50ms
17 task categories: debugging, GitHub, research, writing, deployment, data analysis, image/media, email, PPT, notes, smart home, testing, AI/model, cron, skill management, system admin, code review.
Languages: Chinese + English + mixed. Example: "帮我写一篇12章的小说" →
verified-long-form-writing✅Layer 2: Skill Evaluation Gate (
agent/skill_eval_gate.py)Code-level enforcement, zero keyword matching
Before the agent's first action tool call:
build_skills_system_prompt)pre_tool_callhook inrun_agent.pyBLOCKS any non-read tool untilskill_view()is calledskill_view()is called once, the gate opens for the sessionRead-only tools (read_file, search_files, hindsight_recall, session_search, etc.) are NOT blocked — the agent can freely research before acting.
Layer 3: MiMo Execution Guidance
Anti-hallucination rules injected for MiMo models: verify before stating, no fabrication, multi-source cross-validation.
How It Works in Practice
User says: "帮我模拟未来十年人生,先调研我的背景信息"
Cron job (every 3h):
Changes (6 commits, 7 files, +1097 lines)
agent/skill_db.py(new)agent/hybrid_skill_selector.py(new)agent/skill_eval_gate.py(new)agent/prompt_builder.pybuild_skills_system_prompt_semantic)run_agent.pytools/skills_sync.pytools/skills_tool.pyTest Results
Token Impact
Related