Skip to content

feat(memory): raise/auto-scale memory_char_limit defaults and surface usage pressure #5320

@trevorgordon981

Description

@trevorgordon981

Problem

The curated memory store defaults at hermes_cli/config.py:429-430 are small and tend to get hit quickly on long-running sessions:

"memory_char_limit": 2200,   # ~800 tokens at 2.75 chars/token
"user_char_limit": 1375,     # ~500 tokens at 2.75 chars/token

When a user says "remember X" after the limit is reached, MemoryStore.add() in tools/memory_tool.py:224-235 rejects the write with:

Memory at 2094/2200 chars. Adding this entry (X chars) would exceed the limit. Replace or remove existing entries first.

This surfaces to the user as "the agent can't remember this", with no path forward except manually culling older entries.

Context

  • The defaults were sized for small context windows (~800 tokens). Current models routinely have 125k+ context (see model.context_length in config.yaml), so allocating ~1300 tokens total to curated memory is a tiny fraction of available budget.
  • The memory config section already supports overriding memory_char_limit / user_char_limit per-install, but most users never discover this until they hit the ceiling.
  • MEMORY.md is injected into the system prompt, so raising the cap trades prefix-cache budget for recall capacity. That trade-off is fine on local/self-hosted inference where prompt caching is a knob the operator controls.

Proposal

1. Raise the defaults to something closer to 4x current (e.g. 8800 / 5500 chars, ~3200 / 2000 tokens). 800 tokens was reasonable for 8k-context models; it's conservative for 32k+ windows.

2. Add context-aware auto-sizing. When model.context_length is set in config, derive memory limits as a configurable fraction (default 2%) instead of using hardcoded chars. Something like:

memory_char_limit = int(model_ctx_length * 0.02 * 2.75)  # 2% of ctx, ~2.75 chars/token
user_char_limit   = int(model_ctx_length * 0.01 * 2.75)  # 1% of ctx

On a 125k-context model this produces ~6875 / ~3437 chars (2500 / 1250 tokens).

3. Surface usage pressure proactively. In format_for_system_prompt() or in the tool response, append a one-line hint when the store is above 80% capacity:

[memory 1821/2200 chars (83%) — consider consolidating older entries]

This nudges the agent/user to compact before the hard reject fires.

4. Expose a CLI command hermes memory stats that prints current utilization for MEMORY / USER, top entries by size, and resolved limits with their source (default / config override / context-derived).

Current workaround

Users can already override the limits via ~/.hermes/config.yaml:

memory:
  memory_char_limit: 10000
  user_char_limit: 5000

The feature request is to make this discoverable-by-default and auto-scaled rather than requiring users to find out from an error message.

Why this matters

Multi-day / multi-session users (like me, running Hermes as a persistent agent via launchd) accumulate curated memory steadily. Hitting the 2200 cap silently converts memory add into memory add failed for the rest of the session, degrading the agent's usefulness at exactly the point where more history would help.

Willing to submit a PR

Happy to implement the auto-scaling + pressure hint + stats command if the proposal sounds directionally right. Raising defaults alone is a one-line change and could go in first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havetool/memoryMemory tool and memory providerstype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions