Skip to content

Feature: Consensus & Voting Engine for Multi-Agent Decision Making (inspired by AgentWorkforce/relay) #412

@teknium1

Description

@teknium1

Overview

When multiple agents work on the same problem — whether in a debate, a parallel fan-out, or a review loop — there's currently no structured mechanism for them to reach agreement. The parent agent reads all summaries and makes its own judgment, but the agents themselves can't vote, signal confidence, or formally converge on a decision.

AgentWorkforce/relay implements a client-side consensus engine (packages/sdk/src/consensus.ts) that provides structured voting primitives for multi-agent workflows. This is independently valuable from the debate mode (#376) — debate is about iterative refinement between agents, while consensus is about collective decision-making across N agents.

Inspired by: relay's consensus engine — supports majority, supermajority, unanimous, weighted, and quorum voting strategies with features like vote changing, early resolution, and expiry timers.

Related: #344 (Multi-Agent Architecture), #376 (Adversarial Debate Mode), #356 (Acceptance Criteria & Independent Judge)


Research Findings

How Relay's Consensus Engine Works

The consensus engine runs entirely client-side (no server extension needed). Key components:

Voting strategies:

  • Majority — >50% agreement
  • Supermajority — ≥2/3 agreement
  • Unanimous — 100% agreement
  • Weighted — Votes weighted by agent role/expertise (e.g. security reviewer's vote counts 2x on security questions)
  • Quorum — Minimum participation threshold before any strategy is evaluated

Features:

  • Vote changing — Agents can revise their vote as new information emerges
  • Early resolution — Auto-resolves as soon as a strategy threshold is met (don't wait for all votes)
  • Expiry timers — Votes expire after a configurable window; prevents stale votes from affecting outcomes
  • Configurable quorum — Minimum number of voters required before the result is considered valid

Relay's implementation is tightly coupled to their broker protocol, but the logic is simple and portable (~200 lines of core voting logic).

Use Cases for Hermes

  1. Quality gating on fan-out results — When 3 subagents independently implement something, have them vote on which implementation is best before returning to the parent
  2. Go/no-go decisions — Multiple reviewers vote on whether code is ready to merge
  3. Research synthesis — Multiple research agents vote on the most relevant findings
  4. Conflict resolution — When debate (Feature: Adversarial Debate Mode for Delegation — Two-Agent Iterative Refinement (inspired by CAMEL-AI) #376) doesn't converge, fall back to a structured vote
  5. Cascade routing — Agents vote on whether to escalate to a more expensive model
  6. Weighted expertise — In a security audit workflow, the security-focused agent's vote should carry more weight than the general-purpose agent

Current State in Hermes Agent

No voting or consensus mechanism exists. The closest things are:

  • mixture_of_agents — 4 models generate responses, an aggregator synthesizes. But this is pure synthesis (merge all perspectives), not voting (pick one or decide yes/no).
  • delegate_task batch mode — Returns all results to the parent, which manually picks the best one. No structured comparison.
  • Feature: Acceptance Criteria & Independent Judge for Sub-agent Delegation (inspired by OpenPlanter) #356 (Acceptance Criteria) — An independent judge evaluates quality, but this is a single evaluator, not collective decision-making across agents.

Gap: No mechanism for N agents to express preferences and resolve them via a defined strategy.


Implementation Plan

Skill vs. Tool Classification

This should be a codebase change — a new utility module (agent/consensus.py) used by delegate_tool.py and future workflow orchestration. It needs precise execution logic for vote tallying, quorum checking, and early resolution that must work deterministically. Not a skill (requires exact logic), not a standalone tool (it's infrastructure for multi-agent coordination).

What We'd Need

  1. ConsensusEngine class — Manages a voting session with configurable strategy
  2. Vote collection mechanism — Way for subagents to cast votes (via their return value or a dedicated cast_vote tool)
  3. Strategy implementations — Majority, supermajority, unanimous, weighted, quorum
  4. Resolution logic — Early resolution, expiry, tiebreaking
  5. Integration with delegate_task — Optional consensus parameter on workflow steps

Phased Rollout

Phase 1: Simple Majority Voting on Batch Results (Depends on #344 Phase 1)

Add an optional consensus parameter to delegate_task:

delegate_task(
    tasks=[
        {"goal": "Implement auth module approach A", "context": "..."},
        {"goal": "Implement auth module approach B", "context": "..."},
        {"goal": "Implement auth module approach C", "context": "..."},
    ],
    consensus={
        "strategy": "majority",
        "question": "Which implementation is the most secure and maintainable?",
        "voter": "A security-focused code reviewer"
    }
)

When consensus is provided:

  1. All tasks run in parallel (existing behavior)
  2. After completion, spawn N judge agents (1 per result, or a panel) that evaluate all results
  3. Each judge casts a vote (A, B, or C) with reasoning
  4. Apply the strategy to determine the winner
  5. Return the winning result plus vote breakdown
  • Deliverable: Structured selection from parallel alternatives

Phase 2: In-Workflow Consensus (Depends on #344 Phase 1)

Consensus as a workflow step type:

delegate_task(
    workflow=[
        {"id": "impl_a", "goal": "Implement approach A"},
        {"id": "impl_b", "goal": "Implement approach B"},
        {"id": "decide", "type": "consensus",
         "needs": ["impl_a", "impl_b"],
         "strategy": "weighted",
         "weights": {"security_reviewer": 2, "default": 1},
         "question": "Which approach should we proceed with?"}
    ]
)
  • Deliverable: Decision gates within workflow DAGs

Phase 3: Live Consensus During Agent Collaboration (Depends on #344 Phase 3)

Agents in a shared workflow can call cast_vote(topic, choice, confidence) during execution, not just at step boundaries:

  • Real-time vote tracking during collaborative work
  • Vote changing as new information emerges
  • Automatic convergence detection (early resolution)
  • Deliverable: Dynamic agreement tracking in live multi-agent sessions

Pros & Cons

Pros

Cons / Risks

  • Token cost — Phase 1 spawns additional judge agents to vote, multiplying LLM calls
  • Depends on Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344 — Without workflow infrastructure, consensus is just a post-hoc comparison step
  • Voting quality depends on judge quality — Garbage-in, garbage-out — if the judge prompt is bad, votes are meaningless
  • Overkill for simple tasks — Most delegate_task usage is "do this one thing." Consensus is only valuable for multi-option or multi-reviewer scenarios.
  • Tiebreaking is hard — What happens when 2 out of 4 agents vote for A and 2 for B? Need clear tiebreaking rules.

Open Questions

  1. Should judges be separate agents or the same agents that produced the work? Separate judges are more objective but cost more. Same agents have context but may be biased toward their own output.
  2. How should votes be expressed? Free-text reasoning + structured choice? JSON? A dedicated cast_vote tool?
  3. Should weighted voting weights be static (configured) or dynamic (based on past performance)?
  4. What's the right default strategy? Majority is simplest but supermajority prevents slim-margin decisions.
  5. How does this interact with Feature: Acceptance Criteria & Independent Judge for Sub-agent Delegation (inspired by OpenPlanter) #356 (Acceptance Criteria)? Consensus is group decision; acceptance criteria is pass/fail quality gate. They're complementary but the interaction needs design.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions