feat: add hermes-memory MCP integration (structured persistent memory)#2692
feat: add hermes-memory MCP integration (structured persistent memory)#2692Mibayy wants to merge 2 commits into
Conversation
hermes-memory is a structured memory layer for LLM agents that survives context compression. Ships as a PyPI package (pip install hermes-memory) exposing 7 MCP tools. Changes: - Add docs/hermes-memory-integration.md with full integration guide - Add hermes-memory section to the MCP usage guide - Document relationship to existing memory tool (complementary) PyPI: https://pypi.org/project/hermes-memory/ Related: NousResearch#2662
|
fix: memory_tick type validation (0.1.2) Bugfix pushed to PyPI as Root cause: Fix: Changed |
Live test audit — 2026-03-24Running ongoing validation against a live instance (1-2 days). Updated as we go. hermes-memory test suitev0.1.3 — 42/42 passing, no regressions. Live session audit — functional pass (in-process)Every tool validated against the running MCP instance:
Gauge pressure relief — all 3 mechanisms verified
Note: Bug found and fixed during audit
Documentation
Still pending
|
Relationship to HonchoCame up in discussion so worth documenting here. hermes-memory and Honcho solve adjacent problems with different philosophies. Honcho reasons over conversation history automatically — it infers conclusions, patterns, and user models without the agent explicitly deciding what to store. Powerful for cross-session personalization, costs $0.001-$0.50 per query, requires a hosted API. hermes-memory is explicit: the agent decides what matters, stores it in typed notation, zero external calls, zero cost, works offline. The value proposition depends on staying zero-infra — introducing Honcho as a dependency would break that. They're complementary rather than competing. In the 5-type taxonomy:
Both can run simultaneously without knowing about each other. Gaël is already working on the Honcho side in #2150 (startup context cache for |
Real-world integration test resultsRan 4 end-to-end tests in a live CLI session after the hermes-memory MCP server was installed and configured. No context given to Hermes between tests — cold session each time.
Observations
Overall: integration is solid. The memory layer behaves correctly in a real multi-session context. |
hermes-memory 0.2.0 — plugin integration following hermes-agent v0.4.0Took advantage of the new plugin system in v0.4.0 to extend hermes-memory. Published on PyPI: https://pypi.org/project/hermes-memory/0.2.0/ What was addedPlugin Native integration with the v0.4.0 lifecycle:
Slash commands via
Implemented
MCP standalone toolset
|
- Compression guidelines: abbreviate first (40-60% reduction), then migrate structured facts to hermes-memory DB, then remove duplicates - Before/after example showing C/D/V migration pattern - Automated pressure relief via cron job (2x/day, threshold-based) - Establishes hermes-memory as a relief valve for MEMORY.md overflow
Update: MEMORY.md pressure management sectionAdded a new section "Managing MEMORY.md pressure" to Why this mattersMEMORY.md is injected into every single turn — every byte in there costs tokens on every call. As projects grow, it fills up fast. Until now the doc explained what hermes-memory is, but not how to use it as a relief valve for MEMORY.md overflow. This came out of real usage: after a long session, MEMORY.md hit 85% capacity. The fix wasn't obvious — abbreviate first, then migrate structured facts to the DB, then remove duplicates. That workflow deserves to be documented. What was addedThree rules in priority order:
Plus a cron automation pattern: run 2x/day, check thresholds (55%), compress + migrate if needed, do nothing if already under. Keeps injection cost stable across long-running projects without manual intervention. Also pushed to hermes-memoryThe same section lives in the hermes-memory README (the standalone PyPI package) where it arguably belongs as the primary source of truth. The doc in hermes-agent serves as the integration guide perspective. |
v0.3.0 —
|
Ready for reviewValidation period complete (initiated 2026-03-24, 1-2 days as noted above). Status summary:
What the PR adds:
No code changes to hermes-agent core. hermes-memory remains an optional zero-dependency MCP server — two lines of YAML to configure, Happy to address any feedback. @teknium1 |
… FTS5 Closes NousResearch#2692 (supersedes the MCP server prototype). Adds a typed, searchable fact store directly into hermes-agent with no external process, no MCP transport, and zero user configuration beyond enabling the toolset. ## Background PR NousResearch#2692 implemented this feature as a standalone MCP server (hermes-memory on PyPI). After review feedback, the MCP boundary was dropped in favour of a tighter native integration: same core logic, same schema, same 52-test suite — just without the subprocess overhead and configuration friction. ## What is structured memory A SQLite-backed typed fact store using MEMORY_SPEC notation: C[db.id]: UUID mndtry, nvr autoincrement ← Constraint D[auth]: JWT 7d refresh 6d ← Decision V[srv.prod]: api.example.com:3005 ← Value ?[deploy]: rolling or blue-green? ← Unknown ✓[auth]: deployed to prod ← Done ~[db.id]: old autoincrement scheme ← Obsolete Facts are stored in state.db (sm_facts / sm_scopes / sm_sessions tables) with a FTS5 virtual table for sub-millisecond keyword search. ## New files tools/structured_memory/ constants.py — gauge thresholds, ABBREV_DICT, COMPRESS_MAP, TYPE_MAP, FACT_RE db.py — schema SQL, get_sm_connection(), sm_now(); tables co-located in state.db facts.py — write(), search(), get_hot(), purge(), parse_notation() gauge.py — read(), check_and_act(), _merge_duplicates(), _archive_cold_scopes() scopes.py — get_or_create(), tick(), touch(), close(), auto-cooling logic optimize.py — compress MEMORY.md/USER.md + migrate MEMORY_SPEC lines to store tools/structured_memory_tool.py 7 tools registered in the structured_memory toolset: mcp_memory_write — store a typed fact (gauge check before every write) mcp_memory_search — FTS5 keyword search (default limit 5, max 20) mcp_memory_reflect — synthesize facts by topic, grouped by type mcp_memory_export — dump all facts as MEMORY_SPEC notation mcp_memory_purge — hard-delete superseded/archived facts mcp_memory_optimize — compress flat-file memory + migrate to structured store mcp_memory_gauge — return current pressure state Also exports: get_structured_memory_injection(session_id) — gauge + hot facts for system prompt tick_structured_memory(turn, message_text, session_id) — silent tick hook ## Wiring changes run_agent.py - Automatic memory_tick on every user message (no tool-call turn consumed) - get_structured_memory_injection() called at system prompt build time (gauge + hot facts injected before session starts, zero tool calls) toolsets.py - New structured_memory toolset with all 7 tools model_tools.py - tools.structured_memory_tool added to the module load list ## Automatic pressure management At each write, gauge.check_and_act() fires automatically: ≥70% merge duplicate facts (same target + scope) ≥80% warning in tool response ≥85% archive facts from closed scopes to cold ≥95% push oldest active facts to cold storage ## Tests 52 tests ported from hermes-memory test suite, adapted for native imports and sm_* table names. All pass with isolated tmp_path fixtures. tests/structured_memory/test_facts.py (9 tests) tests/structured_memory/test_gauge.py (4 tests) tests/structured_memory/test_scopes.py (6 tests) tests/structured_memory/test_status.py (5 tests) tests/structured_memory/test_reflect.py (4 tests) tests/structured_memory/test_export_archive.py (5 tests) tests/structured_memory/test_current_turn.py (4 tests) tests/structured_memory/test_optimize.py (15 tests) ## Documentation website/docs/user-guide/features/structured-memory.md — full feature doc website/docs/user-guide/features/memory.md — cross-reference added website/docs/user-guide/configuration.md — toolset config example
The problem, and why existing solutions miss it
Every LLM agent forgets. After ~30 turns, context compression kicks in and silently removes older messages. A constraint decided at turn 5 ("always use UUID, never autoincrement") vanishes by turn 50. The agent contradicts itself, re-asks questions, and the user has to repeat things.
The current memory tool (MEMORY.md) helps for user preferences, but it is unstructured free-text with no search, no lifecycle, and no pressure management. It cannot answer "what did we decide about auth?" without injecting everything.
Existing external solutions (Mem0, Zep, MemGPT) add cloud infra, embedding models, and vector stores. They treat memory as a retrieval problem. But the real problem is what to remember, when to forget, and how to keep the context small.
Why MCP, and why not a core integration
The short answer: MCP is the right boundary for this feature. But if the team prefers a native integration, the code is ready for it.
Why MCP makes sense here:
hermes-memory touches three concerns that are hard to integrate cleanly into a monolith: persistent storage, session lifecycle, and pressure-based memory management. As an MCP server it:
What hermes-memory provides
A structured fact store with 8 MCP tools, typed notation, FTS5 search, scoped lifecycle, and automatic gauge-based pressure management:
C[target](constraints),D[target](decisions),V[target](values),?[target](unknowns),✓[target](resolved),~[target](obsolete)The 8 MCP tools
memory_writememory_searchmemory_tickmemory_statusmemory_reflectmemory_exportmemory_purgememory_optimizememory_optimize — automatic MEMORY.md pressure relief (v0.3.0)
MEMORY.md is injected into every turn — every byte costs tokens on every call.
memory_optimizeacts as a relief valve:MEMORY.mdandUSER.mdfrom~/.hermes/memories/C[...]/D[...]/V[...]lines → migrates them to the hermes-memory DB, removes them from the flat fileRelationship to existing memory tool
Complementary, not a replacement.
memory_optimizemigrates + compresses automaticallyBoth run simultaneously. The memory tool handles "who is the user". hermes-memory handles "what did we decide".
Changes in this PR
docs/hermes-memory-integration.md: full integration guide including MEMORY.md pressure management strategy and cron automation patternInstallation
pip install hermes-memory # v0.3.0Set
HERMES_MEMORY_DBto override default storage path (~/.hermes/memory.db).Technical details