Multi-Agent Council Skill — Solver-Critic Reasoning Engine
A lightweight Hermes skill that adds a second opinion from a different LLM model for debugging and complex analysis. For ~90% of queries it correctly does nothing. For the ~10% that matter, it catches blind spots a single model misses.
Repo: https://github.com/welliv/hermes-multi-agent-council
What Problem Does This Solve?
Single-model LLMs have blind spots. Different model architectures (Gemini vs Claude vs DeepSeek) have different blind spots — cognitive diversity, not role-playing. This skill adds a review from a different model, but only when the query warrants it.
Provider Support
Works with any OpenAI-compatible provider — not locked to OpenRouter:
| Provider |
Model Format |
Env Var |
| OpenRouter (default) |
google/gemini-2.0-flash-001 |
OPENROUTER_API_KEY |
| OpenAI |
openai/gpt-4o |
OPENAI_API_KEY |
| Anthropic |
anthropic/claude-sonnet-4 |
ANTHROPIC_API_KEY |
| Groq |
groq/llama-3.3-70b-versatile |
GROQ_API_KEY |
| Ollama |
ollama/llama3 |
OLLAMA_BASE_URL |
Mix providers freely — solver on Groq, critic on Anthropic, etc.
How It Works
User Query → Smart Router (free) → Solver (1 call) → Critic (0-1 calls) → Quality Gate → Done
Smart Router (free) — ensemble vote: keyword patterns (0.4) + ngram Naive Bayes (0.35) + feature rules (0.25). Falls back to keyword matching if uncertain.
Solver — one strong call. Model selected by query type: code→DeepSeek, math→Gemini, creative→Claude.
Critic — different model architecture reviews using Self-RAG reflection checkpoints. Returns JSON verdict.
Quality Gate — heuristic scoring rejects regressions. Cross-check verifies revision addresses each critic point.
Corrections Buffer — Reflexion-style memory prevents repeated mistakes within a session.
Circuit Breaker — 3 consecutive API failures → 60s cooldown.
When It Helps
- Debugging: "Why does my container crash?" → catches edge cases solver missed
- Complex analysis: "Compare X vs Y for production" → finds overlooked tradeoffs
- Security review: "Audit this auth flow" → different model catches different vulns
When It Doesn't
- Direct questions ("What is Docker?") → router skips critic, zero overhead
- Opinion questions ("Should I use React?") → critic can't be "wrong" on opinions
Installation
git clone https://github.com/welliv/hermes-multi-agent-council.git ~/.hermes/skills/multi-agent-council
Set at least one API key in ~/.hermes/.env and configure ~/.hermes/council/config.json:
{
"solver_model": "google/gemini-2.0-flash-001",
"critic_model": "anthropic/claude-sonnet-4"
}
Research Applied
- Du et al. 2023 — Multiagent debate (different models > same-model)
- Self-RAG 2023 — Reflection tokens improve factuality
- Reflexion 2023 — Verbal memory prevents repeated mistakes
- Constitutional AI 2022 — Quality gate prevents regressions
Multi-Agent Council Skill — Solver-Critic Reasoning Engine
A lightweight Hermes skill that adds a second opinion from a different LLM model for debugging and complex analysis. For ~90% of queries it correctly does nothing. For the ~10% that matter, it catches blind spots a single model misses.
Repo: https://github.com/welliv/hermes-multi-agent-council
What Problem Does This Solve?
Single-model LLMs have blind spots. Different model architectures (Gemini vs Claude vs DeepSeek) have different blind spots — cognitive diversity, not role-playing. This skill adds a review from a different model, but only when the query warrants it.
Provider Support
Works with any OpenAI-compatible provider — not locked to OpenRouter:
google/gemini-2.0-flash-001OPENROUTER_API_KEYopenai/gpt-4oOPENAI_API_KEYanthropic/claude-sonnet-4ANTHROPIC_API_KEYgroq/llama-3.3-70b-versatileGROQ_API_KEYollama/llama3OLLAMA_BASE_URLMix providers freely — solver on Groq, critic on Anthropic, etc.
How It Works
User Query → Smart Router (free) → Solver (1 call) → Critic (0-1 calls) → Quality Gate → Done
Smart Router (free) — ensemble vote: keyword patterns (0.4) + ngram Naive Bayes (0.35) + feature rules (0.25). Falls back to keyword matching if uncertain.
Solver — one strong call. Model selected by query type: code→DeepSeek, math→Gemini, creative→Claude.
Critic — different model architecture reviews using Self-RAG reflection checkpoints. Returns JSON verdict.
Quality Gate — heuristic scoring rejects regressions. Cross-check verifies revision addresses each critic point.
Corrections Buffer — Reflexion-style memory prevents repeated mistakes within a session.
Circuit Breaker — 3 consecutive API failures → 60s cooldown.
When It Helps
When It Doesn't
Installation
git clone https://github.com/welliv/hermes-multi-agent-council.git ~/.hermes/skills/multi-agent-councilSet at least one API key in
~/.hermes/.envand configure~/.hermes/council/config.json:{ "solver_model": "google/gemini-2.0-flash-001", "critic_model": "anthropic/claude-sonnet-4" }Research Applied