Skip to content

holographic memory auto_extract saves raw user messages verbatim instead of extracting preferences #22907

@akamel001

Description

@akamel001

Title: holographic memory auto_extract saves raw user messages verbatim instead of extracting preferences

Summary

The auto_extract feature on the holographic memory plugin matches user messages against simple "I prefer / I like / I want" regex patterns at session end and writes the entire matched message (truncated to 400 chars) into fact_store as a fact. There's no extraction, summarization, or synthesis — just a pattern match followed by a raw dump of conversational text.

The result is that fact_store accumulates entries that are user messages verbatim, not facts. Subsequent holographic recall surfaces these conversational snippets as if they were learned preferences, polluting downstream context with chat fragments.

Repro

  1. Enable plugins.hermes-memory-store.auto_extract: true in config.yaml with the holographic memory provider configured.
  2. Send any user message containing a phrase that matches one of the auto-extractor's regexes — e.g. "I like the new cleanup approach better, can we just write to /tmp instead?"
  3. Let the session end (so on_session_end fires).
  4. Inspect memory_store.db:
    sqlite3 ~/.hermes/memory_store.db "SELECT fact_id, category, content FROM facts ORDER BY fact_id DESC LIMIT 5"

Expected

A fact entry should reflect a synthesized preference, e.g. prefers systemd-tmpfiles over alternative cleanup approaches, or no fact should be saved if the matched phrase is conversational filler rather than a preference statement.

Actual

The entire user message body is stored verbatim as category=user_pref:

fact #N | user_pref | I like the new cleanup approach better, can we just write to /tmp instead?

These contaminating entries have empty tags and helpful_count=0, but holographic recall still surfaces them as semantically-related "facts" in subsequent sessions.

Real entries from one test session after auto_extract: true was enabled (synthetic examples representative of the failure mode):

  • I like that, sounds good
  • I want you to add tests for the new endpoint
  • i like the approach, would you set it up on the staging server?
  • I always check git status before committing

None of these are facts. They're conversational replies that happen to contain the literal substring "I like" / "I want" / "I always."

Suspected cause

plugins/memory/holographic/__init__.py::_auto_extract_facts:

for pattern in _PREF_PATTERNS:
    if pattern.search(content):
        try:
            self._store.add_fact(content[:400], category="user_pref")
            extracted += 1
        except Exception:
            pass
        break

content[:400] is the unmodified user message — the regex (.+) capture group is computed but never used; the call falls back to writing the whole message body. There is no extraction step (LLM call, span-extraction, or even a simple match.group(1) substitution) between pattern match and add_fact.

The patterns themselves are also too permissive for a verbatim-dump approach. \bI\s+like\s+(.+) matches every conversational "I like that idea, let's…" reply, which makes every back-and-forth turn a candidate for ingestion.

Possible fixes

In rough order of effort:

  1. Use the regex capture group: match.group(1) instead of content[:400], so at minimum only the captured remainder is stored. This is a one-line change but doesn't solve the false-positive rate.
  2. Tighten the patterns to match clean preference statements only — e.g. require the message to BE a preference statement (start-of-string anchored, no trailing question marks, length cap on the captured span). Reduces noise but still verbatim.
  3. Replace the regex extractor with a small LLM summarization pass (e.g. via the auxiliary compression slot) that produces a synthesized fact like User prefers X for Y from the matched message. Highest cost, highest signal.

Option (1) is the smallest fix and would be a clear improvement; (3) is what the feature was likely intended to be.

Workaround

auto_extract: false (the default) disables the behavior entirely. Manual fact_store add calls from the agent still work and produce clean entries. Existing contamination can be cleaned up with a SQL filter on category IN ('user_pref','project') AND tags='' AND helpful_count=0 plus a regex check for conversational openers.

Environment

  • Hermes Agent v0.13.0 (v2026.5.7)
  • holographic memory provider, default config except auto_extract: true
  • Reproducible with the in-tree code as of main; no out-of-tree patches required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstool/memoryMemory tool and memory providerstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions