Problem
The curated memory store defaults at hermes_cli/config.py:429-430 are small and tend to get hit quickly on long-running sessions:
"memory_char_limit": 2200, # ~800 tokens at 2.75 chars/token
"user_char_limit": 1375, # ~500 tokens at 2.75 chars/token
When a user says "remember X" after the limit is reached, MemoryStore.add() in tools/memory_tool.py:224-235 rejects the write with:
Memory at 2094/2200 chars. Adding this entry (X chars) would exceed the limit. Replace or remove existing entries first.
This surfaces to the user as "the agent can't remember this", with no path forward except manually culling older entries.
Context
- The defaults were sized for small context windows (~800 tokens). Current models routinely have 125k+ context (see
model.context_length in config.yaml), so allocating ~1300 tokens total to curated memory is a tiny fraction of available budget.
- The
memory config section already supports overriding memory_char_limit / user_char_limit per-install, but most users never discover this until they hit the ceiling.
MEMORY.md is injected into the system prompt, so raising the cap trades prefix-cache budget for recall capacity. That trade-off is fine on local/self-hosted inference where prompt caching is a knob the operator controls.
Proposal
1. Raise the defaults to something closer to 4x current (e.g. 8800 / 5500 chars, ~3200 / 2000 tokens). 800 tokens was reasonable for 8k-context models; it's conservative for 32k+ windows.
2. Add context-aware auto-sizing. When model.context_length is set in config, derive memory limits as a configurable fraction (default 2%) instead of using hardcoded chars. Something like:
memory_char_limit = int(model_ctx_length * 0.02 * 2.75) # 2% of ctx, ~2.75 chars/token
user_char_limit = int(model_ctx_length * 0.01 * 2.75) # 1% of ctx
On a 125k-context model this produces ~6875 / ~3437 chars (2500 / 1250 tokens).
3. Surface usage pressure proactively. In format_for_system_prompt() or in the tool response, append a one-line hint when the store is above 80% capacity:
[memory 1821/2200 chars (83%) — consider consolidating older entries]
This nudges the agent/user to compact before the hard reject fires.
4. Expose a CLI command hermes memory stats that prints current utilization for MEMORY / USER, top entries by size, and resolved limits with their source (default / config override / context-derived).
Current workaround
Users can already override the limits via ~/.hermes/config.yaml:
memory:
memory_char_limit: 10000
user_char_limit: 5000
The feature request is to make this discoverable-by-default and auto-scaled rather than requiring users to find out from an error message.
Why this matters
Multi-day / multi-session users (like me, running Hermes as a persistent agent via launchd) accumulate curated memory steadily. Hitting the 2200 cap silently converts memory add into memory add failed for the rest of the session, degrading the agent's usefulness at exactly the point where more history would help.
Willing to submit a PR
Happy to implement the auto-scaling + pressure hint + stats command if the proposal sounds directionally right. Raising defaults alone is a one-line change and could go in first.
Problem
The curated memory store defaults at
hermes_cli/config.py:429-430are small and tend to get hit quickly on long-running sessions:When a user says "remember X" after the limit is reached,
MemoryStore.add()intools/memory_tool.py:224-235rejects the write with:This surfaces to the user as "the agent can't remember this", with no path forward except manually culling older entries.
Context
model.context_lengthinconfig.yaml), so allocating ~1300 tokens total to curated memory is a tiny fraction of available budget.memoryconfig section already supports overridingmemory_char_limit/user_char_limitper-install, but most users never discover this until they hit the ceiling.MEMORY.mdis injected into the system prompt, so raising the cap trades prefix-cache budget for recall capacity. That trade-off is fine on local/self-hosted inference where prompt caching is a knob the operator controls.Proposal
1. Raise the defaults to something closer to 4x current (e.g. 8800 / 5500 chars, ~3200 / 2000 tokens). 800 tokens was reasonable for 8k-context models; it's conservative for 32k+ windows.
2. Add context-aware auto-sizing. When
model.context_lengthis set in config, derive memory limits as a configurable fraction (default 2%) instead of using hardcoded chars. Something like:On a 125k-context model this produces ~6875 / ~3437 chars (2500 / 1250 tokens).
3. Surface usage pressure proactively. In
format_for_system_prompt()or in the tool response, append a one-line hint when the store is above 80% capacity:This nudges the agent/user to compact before the hard reject fires.
4. Expose a CLI command
hermes memory statsthat prints current utilization for MEMORY / USER, top entries by size, and resolved limits with their source (default / config override / context-derived).Current workaround
Users can already override the limits via
~/.hermes/config.yaml:The feature request is to make this discoverable-by-default and auto-scaled rather than requiring users to find out from an error message.
Why this matters
Multi-day / multi-session users (like me, running Hermes as a persistent agent via launchd) accumulate curated memory steadily. Hitting the 2200 cap silently converts
memory addintomemory add failedfor the rest of the session, degrading the agent's usefulness at exactly the point where more history would help.Willing to submit a PR
Happy to implement the auto-scaling + pressure hint + stats command if the proposal sounds directionally right. Raising defaults alone is a one-line change and could go in first.