Feature: Consensus & Voting Engine for Multi-Agent Decision Making (inspired by AgentWorkforce/relay)

## Overview

When multiple agents work on the same problem — whether in a debate, a parallel fan-out, or a review loop — there's currently no structured mechanism for them to **reach agreement**. The parent agent reads all summaries and makes its own judgment, but the agents themselves can't vote, signal confidence, or formally converge on a decision.

[AgentWorkforce/relay](https://github.com/AgentWorkforce/relay) implements a client-side **consensus engine** (`packages/sdk/src/consensus.ts`) that provides structured voting primitives for multi-agent workflows. This is independently valuable from the debate mode (#376) — debate is about iterative refinement between agents, while consensus is about **collective decision-making** across N agents.

**Inspired by:** relay's consensus engine — supports majority, supermajority, unanimous, weighted, and quorum voting strategies with features like vote changing, early resolution, and expiry timers.

**Related:** #344 (Multi-Agent Architecture), #376 (Adversarial Debate Mode), #356 (Acceptance Criteria & Independent Judge)

---

## Research Findings

### How Relay's Consensus Engine Works

The consensus engine runs entirely client-side (no server extension needed). Key components:

**Voting strategies:**
- **Majority** — >50% agreement
- **Supermajority** — ≥2/3 agreement
- **Unanimous** — 100% agreement
- **Weighted** — Votes weighted by agent role/expertise (e.g. security reviewer's vote counts 2x on security questions)
- **Quorum** — Minimum participation threshold before any strategy is evaluated

**Features:**
- **Vote changing** — Agents can revise their vote as new information emerges
- **Early resolution** — Auto-resolves as soon as a strategy threshold is met (don't wait for all votes)
- **Expiry timers** — Votes expire after a configurable window; prevents stale votes from affecting outcomes
- **Configurable quorum** — Minimum number of voters required before the result is considered valid

**Relay's implementation** is tightly coupled to their broker protocol, but the logic is simple and portable (~200 lines of core voting logic).

### Use Cases for Hermes

1. **Quality gating on fan-out results** — When 3 subagents independently implement something, have them vote on which implementation is best before returning to the parent
2. **Go/no-go decisions** — Multiple reviewers vote on whether code is ready to merge
3. **Research synthesis** — Multiple research agents vote on the most relevant findings
4. **Conflict resolution** — When debate (#376) doesn't converge, fall back to a structured vote
5. **Cascade routing** — Agents vote on whether to escalate to a more expensive model
6. **Weighted expertise** — In a security audit workflow, the security-focused agent's vote should carry more weight than the general-purpose agent

---

## Current State in Hermes Agent

**No voting or consensus mechanism exists.** The closest things are:

- **`mixture_of_agents`** — 4 models generate responses, an aggregator synthesizes. But this is pure synthesis (merge all perspectives), not voting (pick one or decide yes/no).
- **`delegate_task` batch mode** — Returns all results to the parent, which manually picks the best one. No structured comparison.
- **#356 (Acceptance Criteria)** — An independent judge evaluates quality, but this is a single evaluator, not collective decision-making across agents.

**Gap:** No mechanism for N agents to express preferences and resolve them via a defined strategy.

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **codebase change** — a new utility module (`agent/consensus.py`) used by `delegate_tool.py` and future workflow orchestration. It needs precise execution logic for vote tallying, quorum checking, and early resolution that must work deterministically. Not a skill (requires exact logic), not a standalone tool (it's infrastructure for multi-agent coordination).

### What We'd Need

1. **`ConsensusEngine` class** — Manages a voting session with configurable strategy
2. **Vote collection mechanism** — Way for subagents to cast votes (via their return value or a dedicated `cast_vote` tool)
3. **Strategy implementations** — Majority, supermajority, unanimous, weighted, quorum
4. **Resolution logic** — Early resolution, expiry, tiebreaking
5. **Integration with delegate_task** — Optional `consensus` parameter on workflow steps

### Phased Rollout

**Phase 1: Simple Majority Voting on Batch Results (Depends on #344 Phase 1)**

Add an optional `consensus` parameter to `delegate_task`:

```python
delegate_task(
    tasks=[
        {"goal": "Implement auth module approach A", "context": "..."},
        {"goal": "Implement auth module approach B", "context": "..."},
        {"goal": "Implement auth module approach C", "context": "..."},
    ],
    consensus={
        "strategy": "majority",
        "question": "Which implementation is the most secure and maintainable?",
        "voter": "A security-focused code reviewer"
    }
)
```

When `consensus` is provided:
1. All tasks run in parallel (existing behavior)
2. After completion, spawn N judge agents (1 per result, or a panel) that evaluate all results
3. Each judge casts a vote (A, B, or C) with reasoning
4. Apply the strategy to determine the winner
5. Return the winning result plus vote breakdown

- Deliverable: Structured selection from parallel alternatives

**Phase 2: In-Workflow Consensus (Depends on #344 Phase 1)**

Consensus as a workflow step type:

```python
delegate_task(
    workflow=[
        {"id": "impl_a", "goal": "Implement approach A"},
        {"id": "impl_b", "goal": "Implement approach B"},
        {"id": "decide", "type": "consensus",
         "needs": ["impl_a", "impl_b"],
         "strategy": "weighted",
         "weights": {"security_reviewer": 2, "default": 1},
         "question": "Which approach should we proceed with?"}
    ]
)
```

- Deliverable: Decision gates within workflow DAGs

**Phase 3: Live Consensus During Agent Collaboration (Depends on #344 Phase 3)**

Agents in a shared workflow can call `cast_vote(topic, choice, confidence)` during execution, not just at step boundaries:

- Real-time vote tracking during collaborative work
- Vote changing as new information emerges
- Automatic convergence detection (early resolution)
- Deliverable: Dynamic agreement tracking in live multi-agent sessions

---

## Pros & Cons

### Pros
- **Solves a real gap** — Today the parent agent manually picks from batch results with no structure. Voting formalizes this.
- **Composable with existing plans** — Works with debate (#376), acceptance criteria (#356), and workflow DAGs (#344)
- **Proven pattern** — relay's implementation is battle-tested; voting systems are well-understood
- **Low implementation cost** — Phase 1 is ~150-200 LOC of core voting logic + integration
- **Weighted voting is powerful** — Different agent roles contributing different expertise levels to decisions

### Cons / Risks
- **Token cost** — Phase 1 spawns additional judge agents to vote, multiplying LLM calls
- **Depends on #344** — Without workflow infrastructure, consensus is just a post-hoc comparison step
- **Voting quality depends on judge quality** — Garbage-in, garbage-out — if the judge prompt is bad, votes are meaningless
- **Overkill for simple tasks** — Most delegate_task usage is "do this one thing." Consensus is only valuable for multi-option or multi-reviewer scenarios.
- **Tiebreaking is hard** — What happens when 2 out of 4 agents vote for A and 2 for B? Need clear tiebreaking rules.

---

## Open Questions

1. **Should judges be separate agents or the same agents that produced the work?** Separate judges are more objective but cost more. Same agents have context but may be biased toward their own output.
2. **How should votes be expressed?** Free-text reasoning + structured choice? JSON? A dedicated `cast_vote` tool?
3. **Should weighted voting weights be static (configured) or dynamic (based on past performance)?**
4. **What's the right default strategy?** Majority is simplest but supermajority prevents slim-margin decisions.
5. **How does this interact with #356 (Acceptance Criteria)?** Consensus is group decision; acceptance criteria is pass/fail quality gate. They're complementary but the interaction needs design.

---

## References

- [AgentWorkforce/relay](https://github.com/AgentWorkforce/relay) — `packages/sdk/src/consensus.ts` (Apache-2.0)
- relay consensus features: majority, supermajority, unanimous, weighted, quorum voting with early resolution and expiry
- #344 — Multi-Agent Architecture (prerequisite — provides workflow infrastructure)
- #376 — Adversarial Debate Mode (complementary — debate for refinement, consensus for decision)
- #356 — Acceptance Criteria (complementary — quality gating vs collective decision)
- #377 — Shared Memory Pools (complementary — consensus agents may need shared context)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Consensus & Voting Engine for Multi-Agent Decision Making (inspired by AgentWorkforce/relay) #412

Overview

Research Findings

How Relay's Consensus Engine Works

Use Cases for Hermes

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Consensus & Voting Engine for Multi-Agent Decision Making (inspired by AgentWorkforce/relay) #412

Description

Overview

Research Findings

How Relay's Consensus Engine Works

Use Cases for Hermes

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions