bug + feature request: Memory provider tools auto-injected regardless of platform_toolsets config — 10x latency penalty on local models

# Memory provider tools auto-injected regardless of `platform_toolsets` config

## Problem

When `platform_toolsets` is configured with an empty list for a platform (e.g. `telegram: []`), the memory provider's tool schemas (`fact_store`) are still injected into the agent's tool surface via `run_agent.py` lines 1092–1094. This bypasses the platform toolset configuration and has severe performance implications for local model deployments.

```yaml
# config.yaml — user expects zero tools on Telegram
platform_toolsets:
  cli:
    - terminal
    - file
    - memory
    - web
  telegram: []  # should mean NO tools
```

**Expected:** Telegram sessions have no tools. The `tools` parameter is omitted from API calls (per #3820).

**Actual:** Memory provider tools (`fact_store` with actions: search, probe, add, update, reason, contradict, related) are injected into every session regardless of platform, because the injection happens unconditionally in `AIAgent.__init__`:

```python
# run_agent.py ~line 1092
if self._memory_manager and self.tools is not None:
    for _schema in self._memory_manager.get_all_tool_schemas():
        # ...injected regardless of platform_toolsets
```

## Impact

### 1. Tool schema token overhead kills local model performance

On local models served via llama.cpp (e.g. Qwen3-30B-A3B Q4_K_M on RTX 3090), tool-formatted prompts process at **134 tok/s vs 1,230 tok/s for plain text** — a 10x slowdown. The Qwen3 chat template wraps each tool in verbose XML with special tokens that are disproportionately expensive for KV cache computation.

Benchmarks on the same hardware, same model, same prompt:

| Scenario | Prompt tokens | Time | Prompt tok/s |
|----------|--------------|------|-------------|
| No tools | 358 | 291ms | 1,230 |
| 1 tool | 482 | 497ms | 970 |
| 8 tools (Hermes default) | 3,033 | 22,570ms | 134 |

This means a simple "hello" on Telegram takes **42 seconds** with the default tool configuration, vs **1.7 seconds** with tools fully stripped.

### 2. Small models go into tool-call loops

When the only available tools are memory-related (`fact_store`), smaller local models (Qwen3-30B-A3B) interpret the tool availability as an obligation to use them. A simple "hi" message triggered **9 consecutive `fact_store` calls** (search, probe, reason, add, update) before producing a response — taking 78 seconds total.

### 3. Cloud API users are unaffected (masking the issue)

On cloud providers (Claude, GPT-4, OpenRouter), prompt processing is near-instant regardless of tool count, so this overhead is invisible. This likely explains why it hasn't been reported.

## Reproduction

```yaml
# config.yaml
platform_toolsets:
  telegram: []
memory:
  memory_enabled: true
  provider: holographic
model:
  provider: custom
  base_url: http://localhost:4000/v1  # local llama.cpp
  default: my-local-model
```

1. Start gateway: `hermes gateway run`
2. Send any message via Telegram
3. Observe tool schemas in the API call payload — `fact_store` is present despite `telegram: []`
4. On local models, observe 10-40+ second response times for trivial messages

## Proposed Fix

The memory tool injection in `AIAgent.__init__` should respect the platform toolset configuration. Three options:

### Option A: Check platform toolsets before injecting (minimal change)

```python
# run_agent.py ~line 1092
if self._memory_manager and self.tools is not None:
    # Only inject memory tools if 'memory' is in the platform's enabled toolsets
    if "memory" in (self._enabled_toolsets or []):
        for _schema in self._memory_manager.get_all_tool_schemas():
            ...
```

### Option B: Honour `skip_memory` from gateway based on platform config

The `AIAgent.__init__` constructor already accepts `skip_memory: bool`. The gateway could pass `skip_memory=True` when the platform's toolset list doesn't include `memory`:

```python
# gateway/run.py — when constructing AIAgent for a platform session
platform_tools = config.get("platform_toolsets", {}).get(platform_name, [])
agent = AIAgent(
    skip_memory="memory" not in platform_tools,
    ...
)
```

### Option C: Per-platform memory config (most flexible)

```yaml
memory:
  memory_enabled: true
  provider: holographic
  platforms:
    cli: true
    telegram: false  # no memory tools on Telegram
```

Option A is the smallest change. Option B uses existing infrastructure. Option C is the most user-friendly.

## Environment

- Hermes Agent: HEAD `6f1cb46d` (post v0.7.0, 2026-04-06)
- Memory provider: holographic (local SQLite, FTS5+HRR)
- Model: Qwen3-30B-A3B Q4_K_M via llama-server (llama.cpp)
- Hardware: RTX 3090 24GB, single slot (`-np 1`)
- Platform: Telegram (gateway mode)

## Related Issues

- #3820 — Omit empty tools param (fixed, but memory injection re-adds tools)
- #879 — Local model routing for auxiliary tasks
- #4815 — Gateway timeout kills long-running tasks (symptom of the same latency)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug + feature request: Memory provider tools auto-injected regardless of platform_toolsets config — 10x latency penalty on local models #5544

Memory provider tools auto-injected regardless of `platform_toolsets` config

Problem

Impact

1. Tool schema token overhead kills local model performance

2. Small models go into tool-call loops

3. Cloud API users are unaffected (masking the issue)

Reproduction

Proposed Fix

Option A: Check platform toolsets before injecting (minimal change)

Option B: Honour `skip_memory` from gateway based on platform config

Option C: Per-platform memory config (most flexible)

Environment

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scenario	Prompt tokens	Time	Prompt tok/s
No tools	358	291ms	1,230
1 tool	482	497ms	970
8 tools (Hermes default)	3,033	22,570ms	134

bug + feature request: Memory provider tools auto-injected regardless of platform_toolsets config — 10x latency penalty on local models #5544

Description

Memory provider tools auto-injected regardless of platform_toolsets config

Problem

Impact

1. Tool schema token overhead kills local model performance

2. Small models go into tool-call loops

3. Cloud API users are unaffected (masking the issue)

Reproduction

Proposed Fix

Option A: Check platform toolsets before injecting (minimal change)

Option B: Honour skip_memory from gateway based on platform config

Option C: Per-platform memory config (most flexible)

Environment

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Memory provider tools auto-injected regardless of `platform_toolsets` config

Option B: Honour `skip_memory` from gateway based on platform config