Deliberative Collective Intelligence (DCI)

Experiment code and data for:

From Debate to Deliberation: Structured Collective Reasoning with Typed Epistemic Acts Sunil Prakash arXiv:2603.11781 · PDF

What is DCI?

Multi-agent LLM systems typically interact through unstructured debate, majority voting, or rigid orchestration pipelines. None of these model deliberation — a phased process where differentiated participants exchange typed reasoning moves, preserve disagreements, and converge on an explicit outcome.

DCI treats collective reasoning as a first-class computational object:

Component	Description
4 Reasoning Archetypes	Framer (structures the problem), Explorer (generates alternatives), Challenger (stress-tests proposals), Integrator (synthesizes toward decision)
14 Typed Epistemic Acts	`propose`, `challenge`, `evidence`, `reframe`, `synthesize`, `concede`, `object`, `qualify`, `defer`, `escalate`, `poll`, `commit`, `dissent`, `reopen`
Phased Sessions	Opening → Divergence → Convergence → Closure, with explicit phase transition rules
Shared Workspace	Tension register, option table, evidence log — all agents read/write a structured state
DCI-CF Algorithm	Convergent flow that always terminates, producing a decision packet with: selected option, residual objections, minority report, reopen conditions

Key Results

Evaluated on 45 tasks across 7 domains with 8 conditions (185 scored runs, 388 total JSONL-logged runs):

Condition	n	Quality (0-10)
DCI (full)	40	8.24
Unstructured Debate	25	8.43
Majority Voting	25	8.83
Self-Consistency	25	8.69
Single Agent	25	8.89
Ablation: No Archetypes	15	8.73
Ablation: No Grammar	15	8.29
Ablation: No DCI-CF	15	8.31

Finding: On non-routine tasks (n=40), DCI significantly outperforms unstructured debate (+0.95, 95% CI [+0.41, +1.54]). DCI excels on hidden-profile tasks requiring integration of partial perspectives (9.56 — highest score of any system on any domain), while failing on routine decisions (5.39), confirming strong task-dependence. However, DCI consumes ~62x the tokens of a single agent.

Task Domains

Domain	Tasks	Description
Architectural Decision	10	Software architecture tradeoff analysis
Policy Analysis	10	Organizational and technology policy decisions
Hidden Profile	5	Decisions requiring combination of distributed information
Late Evidence	5	Decisions disrupted by new contradictory evidence
Risk Analysis	5	Risk-identification-heavy decisions
Routine Decision	5	Simple decisions (negative control)
Disagreement Decision	5	Decisions with legitimate expert disagreement

Repository Structure

dci-research/
├── src/                        # DCI framework implementation
│   ├── agents/                 #   Delegate agents with archetype prompts
│   │   ├── archetypes.py       #     Framer, Explorer, Challenger, Integrator
│   │   ├── base.py             #     Base agent interface
│   │   └── llm_client.py       #     LLM provider abstraction
│   ├── workflow/               #   DCI-CF session management
│   │   ├── dci_cf.py           #     Convergent flow algorithm
│   │   └── session.py          #     Phased session orchestration
│   ├── workspace/              #   Shared workspace state
│   │   └── state.py            #     Tension register, option table, evidence log
│   ├── grammar/                #   14 typed epistemic acts
│   │   └── moves.py            #     Move schema and validation
│   ├── scoring/                #   Convergence scoring
│   │   └── convergence.py      #     Termination conditions
│   └── baselines/              #   4 baseline implementations
│       ├── single_agent.py
│       ├── unstructured_debate.py
│       ├── voting.py
│       └── self_consistency.py
├── experiments/                # Experiment infrastructure
│   ├── runners/                #   Automated experiment execution
│   ├── evaluation/             #   LLM-as-judge scoring pipeline
│   ├── analysis/               #   Results analysis + LaTeX table generation
│   ├── human_eval/             #   Human evaluation protocol
│   └── configs/                #   Experiment configurations
├── benchmarks/                 # Task definitions
│   └── tasks.py                #   45 tasks across 7 domains
├── results/                    # Experiment data
│   ├── expanded_results.json   #   All 185 scored experiment results
│   ├── logs/                   #   22 JSONL files (388 logged runs)
│   └── tables/                 #   Summary statistics per condition/domain
├── run_all_experiments.py      # Main experiment runner
├── run_expanded_experiments.py # Extended 5-domain experiments
├── run_cross_judge.py          # Cross-model judge validation
├── run_diverse_council.py      # Diverse council experiments
├── smoke_test.py               # Quick validation test
├── .env.example                # API key template
└── requirements.txt

Reproducing Experiments

Setup

git clone https://github.com/sunilp/dci-research.git
cd dci-research
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your Anthropic and/or Google Gemini API keys

Running

# Quick smoke test (1 task, 1 condition)
python smoke_test.py

# Full experiment suite
python run_all_experiments.py

# Extended 5-domain experiments
python run_expanded_experiments.py

# Cross-model judge validation
python run_cross_judge.py

Analyzing Results

import json

# Load all results
results = json.load(open("results/expanded_results.json"))

# Per-condition averages
from collections import defaultdict
by_cond = defaultdict(list)
for r in results:
    score = r["scores"]["overall"]
    if score is not None:
        by_cond[r["condition"]].append(float(score))

for cond, scores in sorted(by_cond.items()):
    print(f"{cond:30s}  n={len(scores):3d}  mean={sum(scores)/len(scores):.2f}")

Data Format

Each entry in expanded_results.json:

{
  "condition": "dci",
  "task_id": "hidden-03",
  "scores": {
    "overall": 9.0,
    "reasoning_depth": 8.0,
    "risk_identification": 9.0,
    "actionability": 8.0
  },
  "tokens": 45230,
  "llm_calls": 12,
  "rounds": 3,
  "latency_ms": 89450,
  "convergence_method": "consensus",
  "decision": "..."
}

Citation

@article{prakash2026dci,
  title={From Debate to Deliberation: Structured Collective Reasoning
         with Typed Epistemic Acts},
  author={Prakash, Sunil},
  journal={arXiv preprint arXiv:2603.11781},
  year={2026}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
experiments		experiments
results		results
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
capture_session_example.py		capture_session_example.py
check_progress.py		check_progress.py
finalize_paper.sh		finalize_paper.sh
pyproject.toml		pyproject.toml
rejudge_failed.py		rejudge_failed.py
requirements.txt		requirements.txt
rerun_dci_phase1.py		rerun_dci_phase1.py
rerun_failed.py		rerun_failed.py
run_all_experiments.py		run_all_experiments.py
run_cross_judge.py		run_cross_judge.py
run_cross_judge_anthropic.py		run_cross_judge_anthropic.py
run_diverse_council.py		run_diverse_council.py
run_expanded_experiments.py		run_expanded_experiments.py
run_phase4_only.py		run_phase4_only.py
run_remaining_phases.py		run_remaining_phases.py
run_supplementary.py		run_supplementary.py
smoke_test.py		smoke_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deliberative Collective Intelligence (DCI)

What is DCI?

Key Results

Task Domains

Repository Structure

Reproducing Experiments

Setup

Running

Analyzing Results

Data Format

Related

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deliberative Collective Intelligence (DCI)

What is DCI?

Key Results

Task Domains

Repository Structure

Reproducing Experiments

Setup

Running

Analyzing Results

Data Format

Related

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages