Skip to content

bug + feature request: Memory provider tools auto-injected regardless of platform_toolsets config — 10x latency penalty on local models #5544

@thundercat49

Description

@thundercat49

Memory provider tools auto-injected regardless of platform_toolsets config

Problem

When platform_toolsets is configured with an empty list for a platform (e.g. telegram: []), the memory provider's tool schemas (fact_store) are still injected into the agent's tool surface via run_agent.py lines 1092–1094. This bypasses the platform toolset configuration and has severe performance implications for local model deployments.

# config.yaml — user expects zero tools on Telegram
platform_toolsets:
  cli:
    - terminal
    - file
    - memory
    - web
  telegram: []  # should mean NO tools

Expected: Telegram sessions have no tools. The tools parameter is omitted from API calls (per #3820).

Actual: Memory provider tools (fact_store with actions: search, probe, add, update, reason, contradict, related) are injected into every session regardless of platform, because the injection happens unconditionally in AIAgent.__init__:

# run_agent.py ~line 1092
if self._memory_manager and self.tools is not None:
    for _schema in self._memory_manager.get_all_tool_schemas():
        # ...injected regardless of platform_toolsets

Impact

1. Tool schema token overhead kills local model performance

On local models served via llama.cpp (e.g. Qwen3-30B-A3B Q4_K_M on RTX 3090), tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain text — a 10x slowdown. The Qwen3 chat template wraps each tool in verbose XML with special tokens that are disproportionately expensive for KV cache computation.

Benchmarks on the same hardware, same model, same prompt:

Scenario Prompt tokens Time Prompt tok/s
No tools 358 291ms 1,230
1 tool 482 497ms 970
8 tools (Hermes default) 3,033 22,570ms 134

This means a simple "hello" on Telegram takes 42 seconds with the default tool configuration, vs 1.7 seconds with tools fully stripped.

2. Small models go into tool-call loops

When the only available tools are memory-related (fact_store), smaller local models (Qwen3-30B-A3B) interpret the tool availability as an obligation to use them. A simple "hi" message triggered 9 consecutive fact_store calls (search, probe, reason, add, update) before producing a response — taking 78 seconds total.

3. Cloud API users are unaffected (masking the issue)

On cloud providers (Claude, GPT-4, OpenRouter), prompt processing is near-instant regardless of tool count, so this overhead is invisible. This likely explains why it hasn't been reported.

Reproduction

# config.yaml
platform_toolsets:
  telegram: []
memory:
  memory_enabled: true
  provider: holographic
model:
  provider: custom
  base_url: http://localhost:4000/v1  # local llama.cpp
  default: my-local-model
  1. Start gateway: hermes gateway run
  2. Send any message via Telegram
  3. Observe tool schemas in the API call payload — fact_store is present despite telegram: []
  4. On local models, observe 10-40+ second response times for trivial messages

Proposed Fix

The memory tool injection in AIAgent.__init__ should respect the platform toolset configuration. Three options:

Option A: Check platform toolsets before injecting (minimal change)

# run_agent.py ~line 1092
if self._memory_manager and self.tools is not None:
    # Only inject memory tools if 'memory' is in the platform's enabled toolsets
    if "memory" in (self._enabled_toolsets or []):
        for _schema in self._memory_manager.get_all_tool_schemas():
            ...

Option B: Honour skip_memory from gateway based on platform config

The AIAgent.__init__ constructor already accepts skip_memory: bool. The gateway could pass skip_memory=True when the platform's toolset list doesn't include memory:

# gateway/run.py — when constructing AIAgent for a platform session
platform_tools = config.get("platform_toolsets", {}).get(platform_name, [])
agent = AIAgent(
    skip_memory="memory" not in platform_tools,
    ...
)

Option C: Per-platform memory config (most flexible)

memory:
  memory_enabled: true
  provider: holographic
  platforms:
    cli: true
    telegram: false  # no memory tools on Telegram

Option A is the smallest change. Option B uses existing infrastructure. Option C is the most user-friendly.

Environment

  • Hermes Agent: HEAD 6f1cb46d (post v0.7.0, 2026-04-06)
  • Memory provider: holographic (local SQLite, FTS5+HRR)
  • Model: Qwen3-30B-A3B Q4_K_M via llama-server (llama.cpp)
  • Hardware: RTX 3090 24GB, single slot (-np 1)
  • Platform: Telegram (gateway mode)

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertool/memoryMemory tool and memory providerstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions