Skip to content

feat: add hermes-memory MCP integration (structured persistent memory)#2692

Closed
Mibayy wants to merge 2 commits into
NousResearch:mainfrom
Mibayy:feat/hermes-memory-integration
Closed

feat: add hermes-memory MCP integration (structured persistent memory)#2692
Mibayy wants to merge 2 commits into
NousResearch:mainfrom
Mibayy:feat/hermes-memory-integration

Conversation

@Mibayy

@Mibayy Mibayy commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

The problem, and why existing solutions miss it

Every LLM agent forgets. After ~30 turns, context compression kicks in and silently removes older messages. A constraint decided at turn 5 ("always use UUID, never autoincrement") vanishes by turn 50. The agent contradicts itself, re-asks questions, and the user has to repeat things.

The current memory tool (MEMORY.md) helps for user preferences, but it is unstructured free-text with no search, no lifecycle, and no pressure management. It cannot answer "what did we decide about auth?" without injecting everything.

Existing external solutions (Mem0, Zep, MemGPT) add cloud infra, embedding models, and vector stores. They treat memory as a retrieval problem. But the real problem is what to remember, when to forget, and how to keep the context small.

Why MCP, and why not a core integration

The short answer: MCP is the right boundary for this feature. But if the team prefers a native integration, the code is ready for it.

Why MCP makes sense here:

hermes-memory touches three concerns that are hard to integrate cleanly into a monolith: persistent storage, session lifecycle, and pressure-based memory management. As an MCP server it:

  • ships independently, versioned separately
  • works with any MCP-compatible agent, not just Hermes
  • can be replaced or extended without touching the agent core
  • keeps the agent codebase clean

What hermes-memory provides

A structured fact store with 8 MCP tools, typed notation, FTS5 search, scoped lifecycle, and automatic gauge-based pressure management:

  • Typed facts: C[target] (constraints), D[target] (decisions), V[target] (values), ?[target] (unknowns), ✓[target] (resolved), ~[target] (obsolete)
  • Scope lifecycle: auto-cooling after 6 turns of silence, topic shift detection
  • Gauge pressure: automatic dedup at 70%, archival at 85%, synthesis at 95%
  • Zero infra: SQLite + FTS5, no cloud, no embedding model, no API keys

The 8 MCP tools

Tool Description
memory_write Store a typed fact in MEMORY_SPEC notation
memory_search FTS5 search across hot + cold facts
memory_tick Advance turn counter, trigger scope cooling
memory_status Session injection payload (gauge + hot facts)
memory_reflect On-demand synthesis grouped by fact type
memory_export Dump all facts as plain notation
memory_purge Hard-delete superseded / archived facts
memory_optimize [v0.3.0] Compress MEMORY.md/USER.md + migrate facts to DB

memory_optimize — automatic MEMORY.md pressure relief (v0.3.0)

MEMORY.md is injected into every turn — every byte costs tokens on every call. memory_optimize acts as a relief valve:

  1. Reads MEMORY.md and USER.md from ~/.hermes/memories/
  2. Scans for C[...] / D[...] / V[...] lines → migrates them to the hermes-memory DB, removes them from the flat file
  3. Applies abbreviation compression on remaining entries (~40-60% reduction)
  4. No-op if both files are below threshold (default 55%) — safe to call on a schedule
# output when action taken:
optimized:
  MEMORY: 85.0% → 38.2%  (3 facts migrated)
  USER:   62.0% → 41.5%
  3 fact(s) moved to hermes-memory DB

# output when healthy:
no action needed
MEMORY: 36.0%  USER: 37.0%
(both below 55% threshold)

Relationship to existing memory tool

Complementary, not a replacement.

memory tool (MEMORY.md) hermes-memory
Storage flat text file SQLite + FTS5
Search substring match full-text search with prefix matching
Structure free-form entries typed notation (C/D/V/?/✓/~)
Scoping none auto-scoped lifecycle with 3 closing triggers
Pressure manual char limit automatic gauge (merge → archive → synthesis)
Overflow relief none memory_optimize migrates + compresses automatically
Best for user prefs, env facts project constraints, decisions, values

Both run simultaneously. The memory tool handles "who is the user". hermes-memory handles "what did we decide".

Changes in this PR

  • docs/hermes-memory-integration.md: full integration guide including MEMORY.md pressure management strategy and cron automation pattern
  • Updated MCP usage guide with hermes-memory section and config example

Installation

pip install hermes-memory  # v0.3.0
mcp_servers:
  hermes-memory:
    command: "hermes-memory"

Set HERMES_MEMORY_DB to override default storage path (~/.hermes/memory.db).

Technical details

hermes-memory is a structured memory layer for LLM agents that
survives context compression. Ships as a PyPI package (pip install
hermes-memory) exposing 7 MCP tools.

Changes:
- Add docs/hermes-memory-integration.md with full integration guide
- Add hermes-memory section to the MCP usage guide
- Document relationship to existing memory tool (complementary)

PyPI: https://pypi.org/project/hermes-memory/
Related: NousResearch#2662
@Mibayy

Mibayy commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

fix: memory_tick type validation (0.1.2)

Bugfix pushed to PyPI as 0.1.2.

Root cause: memory_tick(turn) was rejecting valid integer values with '1' is not of type 'integer'. The MCP JSON Schema validator runs before the handler, and some client contexts serialize turn as a JSON string rather than a number. The handler already coerced with int() but the schema blocked it first.

Fix: Changed turn schema type from "integer" to ["integer", "string"]. Coercion in the handler stays, so behavior is unchanged for well-typed clients.

pip install --upgrade hermes-memory  # 0.1.2

@Mibayy

Mibayy commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

Live test audit — 2026-03-24

Running ongoing validation against a live instance (1-2 days). Updated as we go.

hermes-memory test suite

v0.1.3 — 42/42 passing, no regressions.

tests/test_current_turn.py     4/4   scope lifecycle, silence cooling, dedup
tests/test_export_archive.py   5/5   notation export, cold/archived, atomic close
tests/test_facts.py            9/9   write, dedup, contradiction, truncation, MemoryFullError
tests/test_gauge.py            4/4   threshold detection, cold push
tests/test_reflect.py          4/4   grouping, cold facts, empty LLM response guard
tests/test_scopes.py           6/6   get/create, lifecycle, tick, silence cooling
tests/test_status.py           5/5   type display, notation symbols, hot/cold filter

Live session audit — functional pass (in-process)

Every tool validated against the running MCP instance:

  • memory_status() — gauge %, hot facts, notation block correct
  • memory_write() — hash + gauge returned on each write
  • contradiction/supersession — rewriting C[audit.test] superseded previous entry (a89d0710 -> 5115b4fd); superseded fact excluded from search
  • memory_search() — FTS retrieves correct fact immediately
  • memory_reflect() — groups by type, correct output
  • memory_export() — clean MEMORY_SPEC notation, superseded facts excluded
  • memory_purge() — removes facts cleanly

Gauge pressure relief — all 3 mechanisms verified

Tier Condition Result
MERGE (70%) 5 active duplicates on same target+scope 4 superseded, 1 kept — PASS
ARCHIVE (85%) 10 cold facts in closed scope, >24h old 10 archived, grace window respected — PASS
PUSH TO COLD (95%, no LLM) 92 active facts at 95.6% 12 oldest pushed to cold, gauge 95.6% -> 83.2% — PASS

Note: actions: [] when there is nothing to merge/archive is correct behavior, not a bug.

Bug found and fixed during audit

memory_search(limit=N) and memory_reflect(limit=N) raised '10' is not of type 'integer' — same root cause as the memory_tick fix in 0.1.2. Schema type widened to ["integer", "string"] on both params. Released as 0.1.3https://pypi.org/project/hermes-memory/0.1.3/

Documentation

  • CHANGELOG.md added covering 0.1.1 -> 0.1.2 -> 0.1.3 with root cause notes
  • Inline comments in run_agent.py explain the [AGENT INSTRUCTION] injection pattern and why it exists (see lines 5103-5105)

Still pending

  • 1-2 days live validation (automated healthcheck every 12h)
  • Request review once validation period is clean

@Mibayy

Mibayy commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

Relationship to Honcho

Came up in discussion so worth documenting here.

hermes-memory and Honcho solve adjacent problems with different philosophies.

Honcho reasons over conversation history automatically — it infers conclusions, patterns, and user models without the agent explicitly deciding what to store. Powerful for cross-session personalization, costs $0.001-$0.50 per query, requires a hosted API.

hermes-memory is explicit: the agent decides what matters, stores it in typed notation, zero external calls, zero cost, works offline. The value proposition depends on staying zero-infra — introducing Honcho as a dependency would break that.

They're complementary rather than competing. In the 5-type taxonomy:

  • hermes-memory covers semantic memory (structured facts, decisions, constraints)
  • Honcho covers episodic + semantic via automated inference (user modeling, conversation patterns)

Both can run simultaneously without knowing about each other. Gaël is already working on the Honcho side in #2150 (startup context cache for recallMode=tools). That's the right place for Honcho integration — hermes-agent level, not hermes-memory.

@Mibayy

Mibayy commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

Real-world integration test results

Ran 4 end-to-end tests in a live CLI session after the hermes-memory MCP server was installed and configured. No context given to Hermes between tests — cold session each time.

Query Expected behavior Result
"what's the port for [project X]?" memory_search("project port") → find stored V[] fact ✔ Found correct port immediately
"how do I build [project Y]?" memory_search → no result → fallback to skill ✔ Correctly fell back to skill_view() (procedural knowledge lives in skills, not memory — by design)
"what's the API token for [service Z]?" memory_search("service token") → find stored V[] fact ✔ Found correct token immediately (value redacted)
"how do I open a GitHub PR?" load github-pr-workflow skill ✔ Skill loaded correctly (minor: skipped memory_search before loading skill — acceptable)

Observations

  • hermes-memory correctly stores and retrieves structured values (ports, tokens, API keys, IDs)
  • The skill/memory split works as intended: volatile values → memory_write(), reusable procedures → skills
  • FTS5 retrieval is fast and accurate on short keyword queries
  • One minor behavioral note: for the GitHub PR question, Hermes jumped straight to the skill without calling memory_search first. Not a bug, but a behavior worth tracking — ideally memory_search should always be called first per the system prompt instructions.

Overall: integration is solid. The memory layer behaves correctly in a real multi-session context.

@Mibayy

Mibayy commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

hermes-memory 0.2.0 — plugin integration following hermes-agent v0.4.0

Took advantage of the new plugin system in v0.4.0 to extend hermes-memory. Published on PyPI: https://pypi.org/project/hermes-memory/0.2.0/

What was added

Plugin ~/.hermes/plugins/hermes-memory/

Native integration with the v0.4.0 lifecycle:

  • on_session_end hook — auto-cools all active scopes when a session closes. Previously, facts would stay hot indefinitely if the session ended without an explicit memory_tick call. The hook fixes this cleanly.
  • on_session_start hook — foundation for a future warm-cache.

Slash commands via /memory (alias /mem):

  • /memory status — gauge %, active/cold fact counts, active scopes
  • /memory search <query> — FTS5 search across all facts
  • /memory reflect <topic> — grouped synthesis by fact type (C/D/V/✓/?)
  • /memory purge — hard-delete cold/superseded facts

Implemented register_command() in hermes-agent

PluginContext.register_command() and get_plugin_command_handler() were documented in the v0.4.0 plugin guide but never implemented (test_plugins.py line 370 confirms this). Both are now functional — handlers stored in PluginManager._plugin_commands, dispatched by cli.py and gateway/run.py. Any future plugin wanting to register slash commands benefits from this.

${ENV_VAR} substitution in config.yaml

HERMES_MEMORY_DB migrated to ${HERMES_MEMORY_DB} in config.yaml, resolved from the environment. Portable across VPS installs without manual edits.

MCP standalone toolset

hermes-memory now appears as mcp-hermes-memory in hermes tools, togglable per platform — via v0.4.0 feature #1907, no extra code required.

- Compression guidelines: abbreviate first (40-60% reduction), then migrate
  structured facts to hermes-memory DB, then remove duplicates
- Before/after example showing C/D/V migration pattern
- Automated pressure relief via cron job (2x/day, threshold-based)
- Establishes hermes-memory as a relief valve for MEMORY.md overflow
@Mibayy

Mibayy commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Update: MEMORY.md pressure management section

Added a new section "Managing MEMORY.md pressure" to docs/hermes-memory-integration.md (commit c8afb44).

Why this matters

MEMORY.md is injected into every single turn — every byte in there costs tokens on every call. As projects grow, it fills up fast. Until now the doc explained what hermes-memory is, but not how to use it as a relief valve for MEMORY.md overflow.

This came out of real usage: after a long session, MEMORY.md hit 85% capacity. The fix wasn't obvious — abbreviate first, then migrate structured facts to the DB, then remove duplicates. That workflow deserves to be documented.

What was added

Three rules in priority order:

  1. Abbreviate first — 40-60% reduction with standard shorthands before touching anything else
  2. Migrate structured facts — any C/D/V entry not needed every turn belongs in hermes-memory, not MEMORY.md. Includes a before/after example showing the pattern.
  3. Remove duplicates — facts already in the DB don't need to live in MEMORY.md too

Plus a cron automation pattern: run 2x/day, check thresholds (55%), compress + migrate if needed, do nothing if already under. Keeps injection cost stable across long-running projects without manual intervention.

Also pushed to hermes-memory

The same section lives in the hermes-memory README (the standalone PyPI package) where it arguably belongs as the primary source of truth. The doc in hermes-agent serves as the integration guide perspective.

@Mibayy

Mibayy commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

v0.3.0 — memory_optimize shipped to PyPI

Following up on the MEMORY.md pressure management section added to the docs, the feature is now fully implemented and published in the package itself.

What's new in v0.3.0

memory_optimize — 8th MCP tool. Anyone running pip install hermes-memory gets it automatically.

What it does:

  1. Reads MEMORY.md and USER.md from ~/.hermes/memories/
  2. Scans for any C[...], D[...], V[...], ?[...] lines → migrates them to the hermes-memory DB, removes them from the flat file
  3. Applies abbreviation compression on remaining entries (FR/EN, ~40-60% reduction)
  4. Only acts if either file exceeds the threshold (default 55%). If both are healthy, returns immediately with no changes.
  5. dry_run=true for preview without touching files

Output example:

optimized:
  MEMORY: 85.0% → 38.2%  (3 facts migrated)
  USER:   62.0% → 41.5%
  3 fact(s) moved to hermes-memory DB

or if nothing to do:

no action needed
MEMORY: 36.0%  USER: 37.0%
(both below 55% threshold)

Safe to run on a schedule — the cron pattern from the docs works out of the box:

# config.yaml or cron prompt
memory_optimize()  # 2x/day, no-op if healthy

52 tests passing. PyPI: https://pypi.org/project/hermes-memory/0.3.0/

@Mibayy

Mibayy commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Ready for review

Validation period complete (initiated 2026-03-24, 1-2 days as noted above).

Status summary:

  • 52/52 tests passing (up from 42 in 0.1.3 — coverage expanded for 0.2.0 plugin integration and 0.3.0 memory_optimize)
  • All 3 gauge pressure relief mechanisms validated on a live instance
  • End-to-end integration tests passed (ports, tokens, skills, cross-session recall)
  • CI: 3/3 passing (supply chain scan, docs, test suite)
  • Automated healthcheck running 2x/day since 2026-03-24, no regressions

What the PR adds:

  • docs/hermes-memory-integration.md — full MCP integration guide
  • website/docs/guides/use-mcp-with-hermes.md — hermes-memory section added

No code changes to hermes-agent core. hermes-memory remains an optional zero-dependency MCP server — two lines of YAML to configure, pip install hermes-memory to install.

Happy to address any feedback. @teknium1

Mibayy pushed a commit to Mibayy/hermes-agent that referenced this pull request Mar 26, 2026
… FTS5

Closes NousResearch#2692 (supersedes the MCP server prototype).

Adds a typed, searchable fact store directly into hermes-agent with no
external process, no MCP transport, and zero user configuration beyond
enabling the toolset.

## Background

PR NousResearch#2692 implemented this feature as a standalone MCP server (hermes-memory
on PyPI). After review feedback, the MCP boundary was dropped in favour of
a tighter native integration: same core logic, same schema, same 52-test
suite — just without the subprocess overhead and configuration friction.

## What is structured memory

A SQLite-backed typed fact store using MEMORY_SPEC notation:

  C[db.id]: UUID mndtry, nvr autoincrement   ← Constraint
  D[auth]: JWT 7d refresh 6d                  ← Decision
  V[srv.prod]: api.example.com:3005           ← Value
  ?[deploy]: rolling or blue-green?           ← Unknown
  ✓[auth]: deployed to prod                   ← Done
  ~[db.id]: old autoincrement scheme          ← Obsolete

Facts are stored in state.db (sm_facts / sm_scopes / sm_sessions tables)
with a FTS5 virtual table for sub-millisecond keyword search.

## New files

tools/structured_memory/
  constants.py   — gauge thresholds, ABBREV_DICT, COMPRESS_MAP, TYPE_MAP, FACT_RE
  db.py          — schema SQL, get_sm_connection(), sm_now(); tables co-located in state.db
  facts.py       — write(), search(), get_hot(), purge(), parse_notation()
  gauge.py       — read(), check_and_act(), _merge_duplicates(), _archive_cold_scopes()
  scopes.py      — get_or_create(), tick(), touch(), close(), auto-cooling logic
  optimize.py    — compress MEMORY.md/USER.md + migrate MEMORY_SPEC lines to store

tools/structured_memory_tool.py
  7 tools registered in the structured_memory toolset:
    mcp_memory_write    — store a typed fact (gauge check before every write)
    mcp_memory_search   — FTS5 keyword search (default limit 5, max 20)
    mcp_memory_reflect  — synthesize facts by topic, grouped by type
    mcp_memory_export   — dump all facts as MEMORY_SPEC notation
    mcp_memory_purge    — hard-delete superseded/archived facts
    mcp_memory_optimize — compress flat-file memory + migrate to structured store
    mcp_memory_gauge    — return current pressure state

  Also exports:
    get_structured_memory_injection(session_id) — gauge + hot facts for system prompt
    tick_structured_memory(turn, message_text, session_id) — silent tick hook

## Wiring changes

run_agent.py
  - Automatic memory_tick on every user message (no tool-call turn consumed)
  - get_structured_memory_injection() called at system prompt build time
    (gauge + hot facts injected before session starts, zero tool calls)

toolsets.py
  - New structured_memory toolset with all 7 tools

model_tools.py
  - tools.structured_memory_tool added to the module load list

## Automatic pressure management

At each write, gauge.check_and_act() fires automatically:
  ≥70%  merge duplicate facts (same target + scope)
  ≥80%  warning in tool response
  ≥85%  archive facts from closed scopes to cold
  ≥95%  push oldest active facts to cold storage

## Tests

52 tests ported from hermes-memory test suite, adapted for native imports
and sm_* table names. All pass with isolated tmp_path fixtures.

  tests/structured_memory/test_facts.py           (9 tests)
  tests/structured_memory/test_gauge.py           (4 tests)
  tests/structured_memory/test_scopes.py          (6 tests)
  tests/structured_memory/test_status.py          (5 tests)
  tests/structured_memory/test_reflect.py         (4 tests)
  tests/structured_memory/test_export_archive.py  (5 tests)
  tests/structured_memory/test_current_turn.py    (4 tests)
  tests/structured_memory/test_optimize.py        (15 tests)

## Documentation

website/docs/user-guide/features/structured-memory.md  — full feature doc
website/docs/user-guide/features/memory.md             — cross-reference added
website/docs/user-guide/configuration.md               — toolset config example
@Mibayy Mibayy closed this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant