Skip to content

[Feature]: Improving Multilingual Memory Extraction in Hermes #9135

@andrea9292

Description

@andrea9292

Problem or Use Case

Hermes already has a strong memory architecture, including built-in curated memory (MEMORY.md / USER.md), SQLite-backed session recall with FTS5, and external providers like Holographic.

The problem is that the current auto-extraction path appears to rely heavily on English-oriented heuristics and regex patterns. That works as a lightweight baseline, but it does not generalize well to Korean and other non-English languages.

For multilingual users, the hard part is not storage but durable-candidate judgment: deciding whether something is a stable preference, a project fact, or just session-local context. In Korean, these signals are often expressed indirectly through discourse context, sentence endings, and paraphrased procedural language rather than patterns like “I prefer ...” or “I always ...”.

As a result, Hermes may miss durable facts in Korean or mixed-language chats, and expanding regex rules per language would increase maintenance cost and false positives.

Proposed Solution

Instead of letting heuristics directly decide persistence, Hermes could add an optional LLM-guided candidate extraction stage before persistence.

Suggested pipeline:

conversation history
→ LLM extracts structured memory candidates
→ policy filter checks durability / sensitivity / confidence / scope
→ route to the right store

  • built-in memory for compact, high-value stable facts
  • Holographic for deeper structured facts

The key principle would be: “The LLM proposes. Hermes decides.”

This keeps extraction separate from persistence and should improve multilingual memory quality without requiring large language-specific rule sets.

A small MVP could be:

  1. Add optional LLM candidate extraction at session end
  2. Require strict JSON output
  3. Add policy-gated persistence
  4. Route candidates into built_in_user, built_in_memory, holographic, or discard
  5. Keep the current heuristic path as fallback

Additional Context
I originally planned to open this as a GitHub Discussion because the repository documentation mentions Discussions for design proposals and architecture discussions. However, the Discussions route currently appears unavailable from the repository UI, so I am opening it as an Issue instead.

If useful, I can also provide:

  • Korean test conversations
  • expected candidate extraction examples
  • a longer RFC-style version

Alternatives Considered

I considered extending the current regex / heuristic extraction path, but that seems likely to increase maintenance cost and still perform poorly for Korean and other multilingual cases. I also considered keeping everything provider-specific, but the candidate-extraction pattern seems broadly useful across Hermes memory backends.

Feature Type

Performance / reliability

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions