Problem
Every API call injects full tool schemas for ALL enabled toolsets. With 50+ tools across terminal, file, web, browser, delegate, vision, memory, and more, this consumes ~3,500-5,000 tokens per call — regardless of whether the conversation needs those tools.
On local models, tool-formatted prompts are 10x slower to process than plain text (benchmarked: 1,230 tok/s vs 134 tok/s with 8 tools — see #5544). Even on cloud providers, this is wasted tokens at scale.
For simple conversational turns ("hi", "what model are you using?"), the model doesn't need to know about browser_click or web_crawl or delegate_task. But it gets all of them anyway.
Proposed Solution
Two-pass lazy tool loading:
Pass 1 (every call): Send tool names + one-line descriptions only (~300-500 tokens vs ~4,000)
Pass 2 (on demand): When the model picks a tool, send the full schema in a follow-up call
Flow:
- User sends message
- Hermes sends system prompt + conversation history + ABBREVIATED tool list (name + 1-line description)
- Model either:
a. Responds normally (no tools needed) → done in 1 API call, saved ~3,500 tokens
b. Requests a tool by name → Hermes sends a second call with that tool's full schema injected
c. The model executes the tool, result comes back, continues normally
Config:
tools:
loading: lazy # "eager" (current, default) or "lazy"
Token savings estimate:
| Scenario |
Current |
Lazy |
Savings |
| Simple chat (no tools) |
~5,000 tokens base |
~1,500 tokens base |
~70% |
| One tool call |
~5,000 + response |
~1,500 + 2,000 + response |
~30% |
| Multi-tool session |
~5,000 per call |
~1,500 first call, then ~2,000 after |
30-60% |
Implementation sketch:
- Add a new tool (e.g.
request_tool) that accepts a tool name and returns confirmation + full schema injection
- In
run_agent.py, when lazy loading is enabled:
- Build abbreviated tool list: just
{"name": ..., "description": "Call this to ..."} for each tool
- On first pass, inject abbreviated list +
request_tool as the only real tool
- When model calls
request_tool, inject the requested tool's full schema and re-submit
- Backward compatible: default is
eager (current behavior)
Trade-offs:
- Pro: Massive token savings on conversational turns (the majority of messages)
- Pro: Faster on local models (less prompt processing)
- Pro: Lower API costs on cloud providers
- Con: +1 API round trip when tools ARE needed (adds ~1-2s latency)
- Con: Slightly more complex agent loop
Related Issues
Environment
- Hermes Agent: v0.8.0 (HEAD)
- Model: GLM-5 Turbo via Z.ai
- OS: macOS (Apple Silicon M1 Max)
Problem
Every API call injects full tool schemas for ALL enabled toolsets. With 50+ tools across terminal, file, web, browser, delegate, vision, memory, and more, this consumes ~3,500-5,000 tokens per call — regardless of whether the conversation needs those tools.
On local models, tool-formatted prompts are 10x slower to process than plain text (benchmarked: 1,230 tok/s vs 134 tok/s with 8 tools — see #5544). Even on cloud providers, this is wasted tokens at scale.
For simple conversational turns ("hi", "what model are you using?"), the model doesn't need to know about browser_click or web_crawl or delegate_task. But it gets all of them anyway.
Proposed Solution
Two-pass lazy tool loading:
Pass 1 (every call): Send tool names + one-line descriptions only (~300-500 tokens vs ~4,000)
Pass 2 (on demand): When the model picks a tool, send the full schema in a follow-up call
Flow:
a. Responds normally (no tools needed) → done in 1 API call, saved ~3,500 tokens
b. Requests a tool by name → Hermes sends a second call with that tool's full schema injected
c. The model executes the tool, result comes back, continues normally
Config:
Token savings estimate:
Implementation sketch:
request_tool) that accepts a tool name and returns confirmation + full schema injectionrun_agent.py, when lazy loading is enabled:{"name": ..., "description": "Call this to ..."}for each toolrequest_toolas the only real toolrequest_tool, inject the requested tool's full schema and re-submiteager(current behavior)Trade-offs:
Related Issues
Environment