Long-form research issue, not a fix-now item. Stack itself is fully wired post-#77/#79/#81/#83/#85/#87 — this is a model-quality boundary discovered while debugging the residual tool-paralysis symptom.
What we observed
With the full post-cascade stack deployed, certain models (coding-groq, coding-gpt54) still emit empty assistant responses when called via hermes with a tool attached. Symptom: model produces no tool_call AND no text — pure empty completion.
What we ruled out
- Not preamble bloat (G1): reproduces with NO
HERMES_HOME profile → no vertical-preamble injection → no grafted-context index → empty preamble path.
- Not tool-count saturation: reproduces with a single tool attached (well under the 52-tool ceiling we'd previously hypothesized).
- Not MCP wiring: reproduces with just a built-in tool — autowire / mcp_serve / G2-G4 paths aren't in the loop for this repro.
- Not auth / transport: the same request succeeds when issued via raw API curl.
The actual variable
The system prompt. Replacing hermes' default system prompt (SOUL.md + identity block + capabilities block + behavior guidance + tool surface description, ~several thousand tokens at boot) with a minimal "You are Hermes Agent" — same model, same tool, same provider, same request shape otherwise — produces correct tool_call emission.
So the default hermes system prompt is overwhelming the tool-emission attention path on these models. The model has the capacity (raw API works), it just can't route through hermes' prompt structure to the tool-call output.
Why this isn't a fix-now
- The fusion stack is wired + verified. Workers can spawn, plugins load, MCP discovery works, HERMES_TOOLS_SUBSET narrows (built-in + MCP) — devagentic#203 G1–G4 + autowire are landed.
- The boundary is model-quality, not engineering: stronger models (Claude/GPT-5-class) don't hit this ceiling with the same prompt.
- Cutting the prompt is the wrong default — SOUL.md / identity / capabilities serve real purposes for the operator-facing UX.
Possible directions (research-grade, not committed)
Direction A — NousResearch#210 R1/R2 flow-router with per-intent narrowing
The premise of NousResearch#210 is dynamic per-turn classification of intent → narrowed tool surface + targeted prompt fragment. The same mechanism could narrow the system-prompt surface per intent: "this turn is a tool-call invocation" → strip identity/SOUL/behavior guidance for the model call, keep only tool-call-relevant context. Bridges nicely with the existing HERMES_TOOLS_SUBSET hook point (#75/#86 — same place in agent_init.py).
Direction B — different upstream model
Confirmed: this is a model-quality boundary. Coding-groq + coding-gpt54 hit it. Larger-context / instruction-tuned models (Sonnet / GPT-5-class) handle the full prompt without paralysis. Operator-facing: a tier-classification matrix indicating which models can sustain the full hermes prompt + tools vs which need narrowing.
Direction C — prompt-section ablation study
Before committing to A or B, run a structured ablation: what's the smallest subset of hermes' default system prompt that the affected models can sustain with tools? Cross-product (prompt-fragment × model × tool-count) might surface a Pareto frontier — e.g., "strip behavior guidance, keep identity + capabilities" might be enough for tier-2 models.
Reproduction (for whoever picks this up)
# Fails (empty response, no tool_call)
HERMES_HOME=/tmp/empty-profile hermes --provider groq --model coding-groq \
--enable-toolset core --enable-toolset kanban # narrow surface, default prompt
# Same provider/model/tool, raw API → works
curl https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-d '{"model":"coding-groq","messages":[
{"role":"system","content":"You are Hermes Agent"},
{"role":"user","content":"..."}],"tools":[<single tool>]}'
Severity
Research / backlog. Not blocking poly-explorer end-to-end with appropriate model choice. Worth opening for:
Closes the cascade story
This is the boundary at the end of the post-#67 cascade:
Long-form research issue, not a fix-now item. Stack itself is fully wired post-#77/#79/#81/#83/#85/#87 — this is a model-quality boundary discovered while debugging the residual tool-paralysis symptom.
What we observed
With the full post-cascade stack deployed, certain models (
coding-groq,coding-gpt54) still emit empty assistant responses when called via hermes with a tool attached. Symptom: model produces notool_callAND no text — pure empty completion.What we ruled out
HERMES_HOMEprofile → no vertical-preamble injection → no grafted-context index → empty preamble path.The actual variable
The system prompt. Replacing hermes' default system prompt (SOUL.md + identity block + capabilities block + behavior guidance + tool surface description, ~several thousand tokens at boot) with a minimal
"You are Hermes Agent"— same model, same tool, same provider, same request shape otherwise — produces correcttool_callemission.So the default hermes system prompt is overwhelming the tool-emission attention path on these models. The model has the capacity (raw API works), it just can't route through hermes' prompt structure to the tool-call output.
Why this isn't a fix-now
Possible directions (research-grade, not committed)
Direction A — NousResearch#210 R1/R2 flow-router with per-intent narrowing
The premise of NousResearch#210 is dynamic per-turn classification of intent → narrowed tool surface + targeted prompt fragment. The same mechanism could narrow the system-prompt surface per intent: "this turn is a tool-call invocation" → strip identity/SOUL/behavior guidance for the model call, keep only tool-call-relevant context. Bridges nicely with the existing HERMES_TOOLS_SUBSET hook point (#75/#86 — same place in agent_init.py).
Direction B — different upstream model
Confirmed: this is a model-quality boundary. Coding-groq + coding-gpt54 hit it. Larger-context / instruction-tuned models (Sonnet / GPT-5-class) handle the full prompt without paralysis. Operator-facing: a tier-classification matrix indicating which models can sustain the full hermes prompt + tools vs which need narrowing.
Direction C — prompt-section ablation study
Before committing to A or B, run a structured ablation: what's the smallest subset of hermes' default system prompt that the affected models can sustain with tools? Cross-product (prompt-fragment × model × tool-count) might surface a Pareto frontier — e.g., "strip behavior guidance, keep identity + capabilities" might be enough for tier-2 models.
Reproduction (for whoever picks this up)
Severity
Research / backlog. Not blocking poly-explorer end-to-end with appropriate model choice. Worth opening for:
Closes the cascade story
This is the boundary at the end of the post-#67 cascade:
hermesCLI path (closes #84) #85 (autowire CLI path) → feat(mcp): extend HERMES_TOOLS_SUBSET to filter MCP tools (closes #86) #87 (HERMES_TOOLS_SUBSET extended to MCP) — all merged + deployed.