Skip to content

Research: default Hermes system prompt overwhelms tool-emission on mid-tier coding models #89

@PowerCreek

Description

@PowerCreek

Long-form research issue, not a fix-now item. Stack itself is fully wired post-#77/#79/#81/#83/#85/#87 — this is a model-quality boundary discovered while debugging the residual tool-paralysis symptom.

What we observed

With the full post-cascade stack deployed, certain models (coding-groq, coding-gpt54) still emit empty assistant responses when called via hermes with a tool attached. Symptom: model produces no tool_call AND no text — pure empty completion.

What we ruled out

  • Not preamble bloat (G1): reproduces with NO HERMES_HOME profile → no vertical-preamble injection → no grafted-context index → empty preamble path.
  • Not tool-count saturation: reproduces with a single tool attached (well under the 52-tool ceiling we'd previously hypothesized).
  • Not MCP wiring: reproduces with just a built-in tool — autowire / mcp_serve / G2-G4 paths aren't in the loop for this repro.
  • Not auth / transport: the same request succeeds when issued via raw API curl.

The actual variable

The system prompt. Replacing hermes' default system prompt (SOUL.md + identity block + capabilities block + behavior guidance + tool surface description, ~several thousand tokens at boot) with a minimal "You are Hermes Agent" — same model, same tool, same provider, same request shape otherwise — produces correct tool_call emission.

So the default hermes system prompt is overwhelming the tool-emission attention path on these models. The model has the capacity (raw API works), it just can't route through hermes' prompt structure to the tool-call output.

Why this isn't a fix-now

  1. The fusion stack is wired + verified. Workers can spawn, plugins load, MCP discovery works, HERMES_TOOLS_SUBSET narrows (built-in + MCP) — devagentic#203 G1–G4 + autowire are landed.
  2. The boundary is model-quality, not engineering: stronger models (Claude/GPT-5-class) don't hit this ceiling with the same prompt.
  3. Cutting the prompt is the wrong default — SOUL.md / identity / capabilities serve real purposes for the operator-facing UX.

Possible directions (research-grade, not committed)

Direction A — NousResearch#210 R1/R2 flow-router with per-intent narrowing

The premise of NousResearch#210 is dynamic per-turn classification of intent → narrowed tool surface + targeted prompt fragment. The same mechanism could narrow the system-prompt surface per intent: "this turn is a tool-call invocation" → strip identity/SOUL/behavior guidance for the model call, keep only tool-call-relevant context. Bridges nicely with the existing HERMES_TOOLS_SUBSET hook point (#75/#86 — same place in agent_init.py).

Direction B — different upstream model

Confirmed: this is a model-quality boundary. Coding-groq + coding-gpt54 hit it. Larger-context / instruction-tuned models (Sonnet / GPT-5-class) handle the full prompt without paralysis. Operator-facing: a tier-classification matrix indicating which models can sustain the full hermes prompt + tools vs which need narrowing.

Direction C — prompt-section ablation study

Before committing to A or B, run a structured ablation: what's the smallest subset of hermes' default system prompt that the affected models can sustain with tools? Cross-product (prompt-fragment × model × tool-count) might surface a Pareto frontier — e.g., "strip behavior guidance, keep identity + capabilities" might be enough for tier-2 models.

Reproduction (for whoever picks this up)

# Fails (empty response, no tool_call)
HERMES_HOME=/tmp/empty-profile hermes --provider groq --model coding-groq \
  --enable-toolset core --enable-toolset kanban  # narrow surface, default prompt

# Same provider/model/tool, raw API → works
curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -d '{"model":"coding-groq","messages":[
    {"role":"system","content":"You are Hermes Agent"},
    {"role":"user","content":"..."}],"tools":[<single tool>]}'

Severity

Research / backlog. Not blocking poly-explorer end-to-end with appropriate model choice. Worth opening for:

Closes the cascade story

This is the boundary at the end of the post-#67 cascade:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions