Skip to content

sunilp/dci-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deliberative Collective Intelligence (DCI)

Experiment code and data for:

From Debate to Deliberation: Structured Collective Reasoning with Typed Epistemic Acts Sunil Prakash arXiv:2603.11781 · PDF

What is DCI?

Multi-agent LLM systems typically interact through unstructured debate, majority voting, or rigid orchestration pipelines. None of these model deliberation — a phased process where differentiated participants exchange typed reasoning moves, preserve disagreements, and converge on an explicit outcome.

DCI treats collective reasoning as a first-class computational object:

Component Description
4 Reasoning Archetypes Framer (structures the problem), Explorer (generates alternatives), Challenger (stress-tests proposals), Integrator (synthesizes toward decision)
14 Typed Epistemic Acts propose, challenge, evidence, reframe, synthesize, concede, object, qualify, defer, escalate, poll, commit, dissent, reopen
Phased Sessions Opening → Divergence → Convergence → Closure, with explicit phase transition rules
Shared Workspace Tension register, option table, evidence log — all agents read/write a structured state
DCI-CF Algorithm Convergent flow that always terminates, producing a decision packet with: selected option, residual objections, minority report, reopen conditions

Key Results

Evaluated on 45 tasks across 7 domains with 8 conditions (185 scored runs, 388 total JSONL-logged runs):

Condition n Quality (0-10)
DCI (full) 40 8.24
Unstructured Debate 25 8.43
Majority Voting 25 8.83
Self-Consistency 25 8.69
Single Agent 25 8.89
Ablation: No Archetypes 15 8.73
Ablation: No Grammar 15 8.29
Ablation: No DCI-CF 15 8.31

Finding: On non-routine tasks (n=40), DCI significantly outperforms unstructured debate (+0.95, 95% CI [+0.41, +1.54]). DCI excels on hidden-profile tasks requiring integration of partial perspectives (9.56 — highest score of any system on any domain), while failing on routine decisions (5.39), confirming strong task-dependence. However, DCI consumes ~62x the tokens of a single agent.

Task Domains

Domain Tasks Description
Architectural Decision 10 Software architecture tradeoff analysis
Policy Analysis 10 Organizational and technology policy decisions
Hidden Profile 5 Decisions requiring combination of distributed information
Late Evidence 5 Decisions disrupted by new contradictory evidence
Risk Analysis 5 Risk-identification-heavy decisions
Routine Decision 5 Simple decisions (negative control)
Disagreement Decision 5 Decisions with legitimate expert disagreement

Repository Structure

dci-research/
├── src/                        # DCI framework implementation
│   ├── agents/                 #   Delegate agents with archetype prompts
│   │   ├── archetypes.py       #     Framer, Explorer, Challenger, Integrator
│   │   ├── base.py             #     Base agent interface
│   │   └── llm_client.py       #     LLM provider abstraction
│   ├── workflow/               #   DCI-CF session management
│   │   ├── dci_cf.py           #     Convergent flow algorithm
│   │   └── session.py          #     Phased session orchestration
│   ├── workspace/              #   Shared workspace state
│   │   └── state.py            #     Tension register, option table, evidence log
│   ├── grammar/                #   14 typed epistemic acts
│   │   └── moves.py            #     Move schema and validation
│   ├── scoring/                #   Convergence scoring
│   │   └── convergence.py      #     Termination conditions
│   └── baselines/              #   4 baseline implementations
│       ├── single_agent.py
│       ├── unstructured_debate.py
│       ├── voting.py
│       └── self_consistency.py
├── experiments/                # Experiment infrastructure
│   ├── runners/                #   Automated experiment execution
│   ├── evaluation/             #   LLM-as-judge scoring pipeline
│   ├── analysis/               #   Results analysis + LaTeX table generation
│   ├── human_eval/             #   Human evaluation protocol
│   └── configs/                #   Experiment configurations
├── benchmarks/                 # Task definitions
│   └── tasks.py                #   45 tasks across 7 domains
├── results/                    # Experiment data
│   ├── expanded_results.json   #   All 185 scored experiment results
│   ├── logs/                   #   22 JSONL files (388 logged runs)
│   └── tables/                 #   Summary statistics per condition/domain
├── run_all_experiments.py      # Main experiment runner
├── run_expanded_experiments.py # Extended 5-domain experiments
├── run_cross_judge.py          # Cross-model judge validation
├── run_diverse_council.py      # Diverse council experiments
├── smoke_test.py               # Quick validation test
├── .env.example                # API key template
└── requirements.txt

Reproducing Experiments

Setup

git clone https://github.com/sunilp/dci-research.git
cd dci-research
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your Anthropic and/or Google Gemini API keys

Running

# Quick smoke test (1 task, 1 condition)
python smoke_test.py

# Full experiment suite
python run_all_experiments.py

# Extended 5-domain experiments
python run_expanded_experiments.py

# Cross-model judge validation
python run_cross_judge.py

Analyzing Results

import json

# Load all results
results = json.load(open("results/expanded_results.json"))

# Per-condition averages
from collections import defaultdict
by_cond = defaultdict(list)
for r in results:
    score = r["scores"]["overall"]
    if score is not None:
        by_cond[r["condition"]].append(float(score))

for cond, scores in sorted(by_cond.items()):
    print(f"{cond:30s}  n={len(scores):3d}  mean={sum(scores)/len(scores):.2f}")

Data Format

Each entry in expanded_results.json:

{
  "condition": "dci",
  "task_id": "hidden-03",
  "scores": {
    "overall": 9.0,
    "reasoning_depth": 8.0,
    "risk_identification": 9.0,
    "actionability": 8.0
  },
  "tokens": 45230,
  "llm_calls": 12,
  "rounds": 3,
  "latency_ms": 89450,
  "convergence_method": "consensus",
  "decision": "..."
}

Related

  • LDP (Lightweight Delegation Protocol): arXiv:2603.08852 · Code DCI provides the reasoning layer; LDP provides the delegation protocol for inter-agent communication.

Citation

@article{prakash2026dci,
  title={From Debate to Deliberation: Structured Collective Reasoning
         with Typed Epistemic Acts},
  author={Prakash, Sunil},
  journal={arXiv preprint arXiv:2603.11781},
  year={2026}
}

License

MIT

About

Experiment code and data for: From Debate to Deliberation (arXiv:2603.11781)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors