Skip to content

Multi-Agent Council Skill — 4-Persona Reasoning with Nested Multi-Agent #5876

@welliv

Description

@welliv

Multi-Agent Council Skill — Solver-Critic Reasoning Engine

A lightweight Hermes skill that adds a second opinion from a different LLM model for debugging and complex analysis. For ~90% of queries it correctly does nothing. For the ~10% that matter, it catches blind spots a single model misses.

Repo: https://github.com/welliv/hermes-multi-agent-council

What Problem Does This Solve?

Single-model LLMs have blind spots. Different model architectures (Gemini vs Claude vs DeepSeek) have different blind spots — cognitive diversity, not role-playing. This skill adds a review from a different model, but only when the query warrants it.

Provider Support

Works with any OpenAI-compatible provider — not locked to OpenRouter:

Provider Model Format Env Var
OpenRouter (default) google/gemini-2.0-flash-001 OPENROUTER_API_KEY
OpenAI openai/gpt-4o OPENAI_API_KEY
Anthropic anthropic/claude-sonnet-4 ANTHROPIC_API_KEY
Groq groq/llama-3.3-70b-versatile GROQ_API_KEY
Ollama ollama/llama3 OLLAMA_BASE_URL

Mix providers freely — solver on Groq, critic on Anthropic, etc.

How It Works

User Query → Smart Router (free) → Solver (1 call) → Critic (0-1 calls) → Quality Gate → Done

Smart Router (free) — ensemble vote: keyword patterns (0.4) + ngram Naive Bayes (0.35) + feature rules (0.25). Falls back to keyword matching if uncertain.

Solver — one strong call. Model selected by query type: code→DeepSeek, math→Gemini, creative→Claude.

Critic — different model architecture reviews using Self-RAG reflection checkpoints. Returns JSON verdict.

Quality Gate — heuristic scoring rejects regressions. Cross-check verifies revision addresses each critic point.

Corrections Buffer — Reflexion-style memory prevents repeated mistakes within a session.

Circuit Breaker — 3 consecutive API failures → 60s cooldown.

When It Helps

  • Debugging: "Why does my container crash?" → catches edge cases solver missed
  • Complex analysis: "Compare X vs Y for production" → finds overlooked tradeoffs
  • Security review: "Audit this auth flow" → different model catches different vulns

When It Doesn't

  • Direct questions ("What is Docker?") → router skips critic, zero overhead
  • Opinion questions ("Should I use React?") → critic can't be "wrong" on opinions

Installation

git clone https://github.com/welliv/hermes-multi-agent-council.git ~/.hermes/skills/multi-agent-council

Set at least one API key in ~/.hermes/.env and configure ~/.hermes/council/config.json:

{
  "solver_model": "google/gemini-2.0-flash-001",
  "critic_model": "anthropic/claude-sonnet-4"
}

Research Applied

  • Du et al. 2023 — Multiagent debate (different models > same-model)
  • Self-RAG 2023 — Reflection tokens improve factuality
  • Reflexion 2023 — Verbal memory prevents repeated mistakes
  • Constitutional AI 2022 — Quality gate prevents regressions

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havetool/delegateSubagent delegationtool/skillsSkills system (list, view, manage)type/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions