LDP Research: Empirical Evaluation of the LLM Delegate Protocol

Experiment code, data, and paper source for:

LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems Sunil Prakash, Indian School of Business arXiv: 2603.08852

The Provenance Paradox in Multi-Agent LLM Routing: Delegation Contracts and Attested Identity in LDP Sunil Prakash, Indian School of Business arXiv: 2603.18043

Overview

This repository contains everything needed to reproduce the empirical evaluation of the LLM Delegate Protocol (LDP) against A2A and random baselines. The study evaluates six research questions spanning routing quality, payload efficiency, provenance impact, session overhead, security boundaries, and fallback reliability.

Key Results

RQ	Finding	Evidence
RQ1 Routing	~12x lower latency on easy tasks; quality comparable (n.s.)	Empirical
RQ2 Payload	37% token reduction (p=0.031) with no observed quality loss	Empirical
RQ3 Provenance	Noisy provenance degrades quality below no-provenance baseline	Empirical
RQ4 Sessions	39% token overhead eliminated at 10 conversation rounds	Empirical
RQ5 Security	96% vs 6% attack detection rate	Simulated
RQ6 Fallback	100% vs 35% task completion under failures	Simulated

Repository Structure

ldp-research/
├── paper/                     # Paper 1: LaTeX source, figures, tables
│   ├── main.tex               # Paper source
│   ├── figures/               # 9 publication-quality PDF figures
│   ├── tables/                # 7 LaTeX tables
│   └── generate_figures.py    # Figure generation script
├── paper2/                    # Paper 2: Provenance Paradox (arXiv:2603.18043)
│   ├── main.tex               # Paper source
│   └── figures/               # Publication figures
├── baselines/                 # Protocol baseline implementations
│   ├── protocol.py            # ProtocolBaseline abstract interface
│   ├── ldp_baseline.py        # LDP: metadata routing + identity prompts
│   ├── a2a_baseline.py        # A2A: skill-match routing + generic prompts
│   ├── ablation_baselines.py  # 2x2 factorial ablation conditions
│   └── llm_client.py          # Unified LLM client (Ollama, Gemini)
├── experiments/
│   ├── runners/               # Experiment runner and main entry point
│   ├── evaluation/            # LLM-as-judge, metrics, logging
│   ├── analysis/              # Results analysis and LaTeX table generation
│   └── configs/               # YAML experiment configurations
├── results/
│   ├── tables/                # Aggregated results (JSON)
│   └── logs/                  # Raw experiment logs (JSONL)
├── src/                       # Rust protocol implementation (see ldp-protocol)
├── tests/                     # Integration tests
└── docs/                      # Design documentation

Reproducing Experiments

Prerequisites

Ollama with models: qwen3:8b, qwen2.5-coder:7b, llama3.2:3b
Python 3.10+ with dependencies: pip install -r requirements.txt
Google Gemini API key (for LLM-as-judge evaluation)

Setup

cp .env.example .env
# Add your GOOGLE_API_KEY to .env

# Pull Ollama models
ollama pull qwen3:8b
ollama pull qwen2.5-coder:7b
ollama pull llama3.2:3b

Run Experiments

# Run all experiments (local Ollama config)
python -m experiments.runners.main --config experiments/configs/local.yaml

# Run specific experiment
python -m experiments.runners.main --config experiments/configs/local.yaml --experiments routing

# Generate paper figures
python paper/generate_figures.py

# Generate paper tables
python -m experiments.analysis.generate_latex

Hardware

All experiments were run on a single Apple Silicon machine (36GB RAM) using local Ollama inference. Total compute: ~8 hours for all experiments including ablation.

Related Writing

Related Repositories

ldp-protocol — Protocol specification (RFC) and Rust reference implementation
JamJet — Agent runtime that hosts the LDP adapter

License

Apache-2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDP Research: Empirical Evaluation of the LLM Delegate Protocol

Overview

Key Results

Repository Structure

Reproducing Experiments

Prerequisites

Setup

Run Experiments

Hardware

Related Writing

Related Repositories

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
baselines		baselines
docs		docs
experiments		experiments
paper		paper
results		results
src		src
tests		tests
.coderabbit.yaml		.coderabbit.yaml
.env.example		.env.example
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
smoke_test.py		smoke_test.py

Folders and files

Latest commit

History

Repository files navigation

LDP Research: Empirical Evaluation of the LLM Delegate Protocol

Overview

Key Results

Repository Structure

Reproducing Experiments

Prerequisites

Setup

Run Experiments

Hardware

Related Writing

Related Repositories

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages