Problem or Use Case
Tool-enabled Hermes sessions still pay a meaningful startup and prompt-size cost before the model can do any real work.
In practice, the expensive parts are:
- shipping a large default tool schema on every session start
- eagerly discovering plugin and MCP tools even when the task never uses them
- exposing low-frequency heavy tools by default even in common sessions
This especially hurts short chat turns and messaging-platform runs where first-response latency matters.
A local prototype showed that these costs are high enough to matter to user-perceived speed.
Sanitized prototype measurements:
- cold
model_tools import improved from about 2.75s to about 1.02s
- default tool schema payload improved from about
41,913 chars to about 22,996 chars initially, and then about 21,190 chars after follow-up compaction
- rough schema token estimate improved from about
10,479 to about 5,298
Proposed Solution
Add a first-class lean/default tool exposure mode in Hermes core, with a full-compatibility fallback.
Key pieces:
- Lean default tool exposure
- keep common tools visible by default
- hide heavy/low-frequency tools unless explicitly requested
- trim oversized descriptions/parameter help for the default profile
- preserve a
full profile for compatibility and debugging
- Lazy non-core discovery
- keep builtin discovery eager and reliable
- defer plugin discovery until tool resolution actually needs it
- defer MCP discovery until an MCP toolset is explicitly requested
- keep discovery idempotent and thread-safe
- Preserve dynamic schema correctness
- if runtime schema rewriting happens after profile compaction (for example
execute_code rebuilding its dynamic tool list), re-apply lean compaction so the verbose schema does not come back by accident
- keep cross-tool references accurate so filtered sessions do not hallucinate missing tools
- Ship a small benchmark harness with it
- one command should report import cost,
get_tool_definitions() duration, tool count, schema size, and biggest schemas
- this makes regressions obvious during future refactors
Redacted implementation sketch:
profile = normalize_tool_exposure_profile(requested_profile)
if explicit_mcp_toolset_requested(enabled_toolsets):
discover_mcp_tools()
filtered_tools = registry.get_definitions(selected_tool_names)
filtered_tools = apply_tool_exposure_profile(filtered_tools, profile)
filtered_tools = rebuild_dynamic_schemas(filtered_tools)
filtered_tools = apply_tool_exposure_profile(filtered_tools, profile) # second pass
Alternatives Considered
- Keep exposing the full tool bundle by default and only shorten descriptions.
- Helps a bit, but does not address eager discovery or low-frequency tool overhead.
- Require users to manually disable toolsets.
- Too much user friction and not a good default experience.
- Lazy-discover everything, including builtins.
- Riskier and less predictable than keeping builtin discovery simple and deferring only the heavier paths.
Feature Type
Performance / reliability
Scope
Large (new default behavior plus supporting tests/benchmarking)
Contribution
A redacted local prototype already exists and has been validated with targeted tests and benchmark measurements. I’d be happy to help upstream it or split it into smaller PR-sized pieces.
Notes
This is not a request for a new skill or niche integration. It is core agent/runtime behavior that affects first-turn latency across the product.
Problem or Use Case
Tool-enabled Hermes sessions still pay a meaningful startup and prompt-size cost before the model can do any real work.
In practice, the expensive parts are:
This especially hurts short chat turns and messaging-platform runs where first-response latency matters.
A local prototype showed that these costs are high enough to matter to user-perceived speed.
Sanitized prototype measurements:
model_toolsimport improved from about2.75sto about1.02s41,913chars to about22,996chars initially, and then about21,190chars after follow-up compaction10,479to about5,298Proposed Solution
Add a first-class lean/default tool exposure mode in Hermes core, with a full-compatibility fallback.
Key pieces:
fullprofile for compatibility and debuggingexecute_coderebuilding its dynamic tool list), re-apply lean compaction so the verbose schema does not come back by accidentget_tool_definitions()duration, tool count, schema size, and biggest schemasRedacted implementation sketch:
Alternatives Considered
Feature Type
Performance / reliability
Scope
Large (new default behavior plus supporting tests/benchmarking)
Contribution
A redacted local prototype already exists and has been validated with targeted tests and benchmark measurements. I’d be happy to help upstream it or split it into smaller PR-sized pieces.
Notes
This is not a request for a new skill or niche integration. It is core agent/runtime behavior that affects first-turn latency across the product.