Memory provider tools auto-injected regardless of platform_toolsets config
Problem
When platform_toolsets is configured with an empty list for a platform (e.g. telegram: []), the memory provider's tool schemas (fact_store) are still injected into the agent's tool surface via run_agent.py lines 1092–1094. This bypasses the platform toolset configuration and has severe performance implications for local model deployments.
# config.yaml — user expects zero tools on Telegram
platform_toolsets:
cli:
- terminal
- file
- memory
- web
telegram: [] # should mean NO tools
Expected: Telegram sessions have no tools. The tools parameter is omitted from API calls (per #3820).
Actual: Memory provider tools (fact_store with actions: search, probe, add, update, reason, contradict, related) are injected into every session regardless of platform, because the injection happens unconditionally in AIAgent.__init__:
# run_agent.py ~line 1092
if self._memory_manager and self.tools is not None:
for _schema in self._memory_manager.get_all_tool_schemas():
# ...injected regardless of platform_toolsets
Impact
1. Tool schema token overhead kills local model performance
On local models served via llama.cpp (e.g. Qwen3-30B-A3B Q4_K_M on RTX 3090), tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain text — a 10x slowdown. The Qwen3 chat template wraps each tool in verbose XML with special tokens that are disproportionately expensive for KV cache computation.
Benchmarks on the same hardware, same model, same prompt:
| Scenario |
Prompt tokens |
Time |
Prompt tok/s |
| No tools |
358 |
291ms |
1,230 |
| 1 tool |
482 |
497ms |
970 |
| 8 tools (Hermes default) |
3,033 |
22,570ms |
134 |
This means a simple "hello" on Telegram takes 42 seconds with the default tool configuration, vs 1.7 seconds with tools fully stripped.
2. Small models go into tool-call loops
When the only available tools are memory-related (fact_store), smaller local models (Qwen3-30B-A3B) interpret the tool availability as an obligation to use them. A simple "hi" message triggered 9 consecutive fact_store calls (search, probe, reason, add, update) before producing a response — taking 78 seconds total.
3. Cloud API users are unaffected (masking the issue)
On cloud providers (Claude, GPT-4, OpenRouter), prompt processing is near-instant regardless of tool count, so this overhead is invisible. This likely explains why it hasn't been reported.
Reproduction
# config.yaml
platform_toolsets:
telegram: []
memory:
memory_enabled: true
provider: holographic
model:
provider: custom
base_url: http://localhost:4000/v1 # local llama.cpp
default: my-local-model
- Start gateway:
hermes gateway run
- Send any message via Telegram
- Observe tool schemas in the API call payload —
fact_store is present despite telegram: []
- On local models, observe 10-40+ second response times for trivial messages
Proposed Fix
The memory tool injection in AIAgent.__init__ should respect the platform toolset configuration. Three options:
Option A: Check platform toolsets before injecting (minimal change)
# run_agent.py ~line 1092
if self._memory_manager and self.tools is not None:
# Only inject memory tools if 'memory' is in the platform's enabled toolsets
if "memory" in (self._enabled_toolsets or []):
for _schema in self._memory_manager.get_all_tool_schemas():
...
Option B: Honour skip_memory from gateway based on platform config
The AIAgent.__init__ constructor already accepts skip_memory: bool. The gateway could pass skip_memory=True when the platform's toolset list doesn't include memory:
# gateway/run.py — when constructing AIAgent for a platform session
platform_tools = config.get("platform_toolsets", {}).get(platform_name, [])
agent = AIAgent(
skip_memory="memory" not in platform_tools,
...
)
Option C: Per-platform memory config (most flexible)
memory:
memory_enabled: true
provider: holographic
platforms:
cli: true
telegram: false # no memory tools on Telegram
Option A is the smallest change. Option B uses existing infrastructure. Option C is the most user-friendly.
Environment
- Hermes Agent: HEAD
6f1cb46d (post v0.7.0, 2026-04-06)
- Memory provider: holographic (local SQLite, FTS5+HRR)
- Model: Qwen3-30B-A3B Q4_K_M via llama-server (llama.cpp)
- Hardware: RTX 3090 24GB, single slot (
-np 1)
- Platform: Telegram (gateway mode)
Related Issues
Memory provider tools auto-injected regardless of
platform_toolsetsconfigProblem
When
platform_toolsetsis configured with an empty list for a platform (e.g.telegram: []), the memory provider's tool schemas (fact_store) are still injected into the agent's tool surface viarun_agent.pylines 1092–1094. This bypasses the platform toolset configuration and has severe performance implications for local model deployments.Expected: Telegram sessions have no tools. The
toolsparameter is omitted from API calls (per #3820).Actual: Memory provider tools (
fact_storewith actions: search, probe, add, update, reason, contradict, related) are injected into every session regardless of platform, because the injection happens unconditionally inAIAgent.__init__:Impact
1. Tool schema token overhead kills local model performance
On local models served via llama.cpp (e.g. Qwen3-30B-A3B Q4_K_M on RTX 3090), tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain text — a 10x slowdown. The Qwen3 chat template wraps each tool in verbose XML with special tokens that are disproportionately expensive for KV cache computation.
Benchmarks on the same hardware, same model, same prompt:
This means a simple "hello" on Telegram takes 42 seconds with the default tool configuration, vs 1.7 seconds with tools fully stripped.
2. Small models go into tool-call loops
When the only available tools are memory-related (
fact_store), smaller local models (Qwen3-30B-A3B) interpret the tool availability as an obligation to use them. A simple "hi" message triggered 9 consecutivefact_storecalls (search, probe, reason, add, update) before producing a response — taking 78 seconds total.3. Cloud API users are unaffected (masking the issue)
On cloud providers (Claude, GPT-4, OpenRouter), prompt processing is near-instant regardless of tool count, so this overhead is invisible. This likely explains why it hasn't been reported.
Reproduction
hermes gateway runfact_storeis present despitetelegram: []Proposed Fix
The memory tool injection in
AIAgent.__init__should respect the platform toolset configuration. Three options:Option A: Check platform toolsets before injecting (minimal change)
Option B: Honour
skip_memoryfrom gateway based on platform configThe
AIAgent.__init__constructor already acceptsskip_memory: bool. The gateway could passskip_memory=Truewhen the platform's toolset list doesn't includememory:Option C: Per-platform memory config (most flexible)
Option A is the smallest change. Option B uses existing infrastructure. Option C is the most user-friendly.
Environment
6f1cb46d(post v0.7.0, 2026-04-06)-np 1)Related Issues