Parallel Reasoning & Inference Synthesis Machine
PRISM is an evaluation and reasoning orchestration system for large language models.
Instead of relying on a single model output, PRISM runs the same prompt across multiple LLMs, analyzes their responses, and synthesizes a transparent final answer. The system exposes the full reasoning surface area of each model while providing a structured comparison and a synthesized result.
PRISM is designed for developers, researchers, and teams building AI systems who want reliable outputs, model transparency, and a structured evaluation pipeline.
Modern AI systems often rely on a single model response, even though different models reason differently. This creates problems:
- A single model may hallucinate or fail silently
- Outputs can vary significantly across providers
- There is limited visibility into model disagreement
- Developers lack systematic tools for comparing reasoning quality
PRISM addresses this by treating models as parallel reasoning engines and providing tools to analyze and synthesize their outputs.
Instead of asking “What did the model say?”, PRISM helps answer:
- Which models agree?
- Where do they disagree?
- Which response is most complete?
- What is the best synthesized answer across models?
PRISM sends a prompt to multiple LLMs simultaneously and collects their responses asynchronously.
This allows the system to compare reasoning across models such as OpenAI, Gemini, or any custom provider.
PRISM evaluates responses using lightweight similarity and coverage heuristics, including:
- Token overlap (Jaccard similarity)
- Response length comparison
- Keyword coverage
- Pairwise disagreement analysis
This provides insight into how models differ, not just what they output.
After evaluating responses, PRISM produces a synthesized final answer using configurable strategies.
Current synthesis strategies include:
- Majority agreement when models converge
- Coverage-based selection when responses diverge
- Transparent rationale for the chosen answer
PRISM runs entirely locally and does not require external infrastructure.
There is no dependency on:
- hosted databases
- managed queues
- deployment platforms
This makes PRISM easy to experiment with and extend.
PRISM provides a clean abstraction for integrating new model providers.
Adding support for a new LLM only requires implementing a client that subclasses LLMClient.
PRISM consists of three main components.
The backend receives prompts and orchestrates parallel model calls.
Responsibilities:
- fan-out prompt execution
- manage model timeouts
- collect responses and metadata
- compute similarity metrics
After collecting model outputs, PRISM applies synthesis strategies to produce a final answer.
This layer is modular and allows experimentation with different reasoning aggregation methods.
The frontend provides a simple interface for running evaluations.
Users can:
- enter prompts
- select models
- view responses side-by-side
- inspect synthesis decisions
- explore model disagreement
- A user submits a prompt through the UI or API.
- The evaluation engine sends the prompt to multiple models.
- Each model returns a response.
- PRISM compares the responses using similarity heuristics.
- The synthesis engine selects or constructs the final answer.
- The system returns:
- all raw model outputs
- comparison metrics
- synthesized response
- rationale
curl -X POST http://127.0.0.1:8000/evaluate \
-H "Content-Type: application/json" \
-d '{
"prompt": "List three benefits of testing",
"models": ["mock:echo","mock:pseudo"],
"temperature": 0,
"max_tokens": 256,
"timeout_s": 12,
"synthesis_method": "best_of_n"
}'The response includes:
- model outputs
- latency metadata
- similarity metrics
- synthesized result
PRISM
│
├── backend/
│ FastAPI evaluation service
│ LLM provider abstractions
│ evaluation pipeline
│ synthesis strategies
│
├── frontend/
│ React + Vite interface
│ prompt runner and model comparison UI
│
└── docs/
architecture notes
evaluation methodology
roadmap
- Node.js 20+
- Python 3.10+ (3.11 recommended)
cd backend
python -m venv .venv
# Windows
.venv\Scripts\Activate.ps1
# macOS / Linux
source .venv/bin/activate
pip install -r requirements.txt
# PowerShell (from backend/)
$env:PYTHONPATH="."
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000
# Alternative (from repo root)
# $env:PYTHONPATH="backend"
# uvicorn app.main:app --reload --app-dir backend --host 127.0.0.1 --port 8000API docs will be available at:
http://127.0.0.1:8000/docs
Create a .env file in the project root if you want to use real models.
OPENAI_API_KEY=...
GEMINI_API_KEY=...
PRISM_GEMINI_THINKING_BUDGET=0
Mock models will work without any keys.
cd frontend
npm install
echo VITE_API_BASE_URL=http://127.0.0.1:8000 > .env.local
npm run devOpen:
http://localhost:5173
Terminal 1
start backend server
Terminal 2
npm run dev
Then open the UI and run prompt evaluations.
$env:PYTHONPATH="backend"
pytest backend/tests -qCreate a subclass of:
backend/app/llms/LLMClient
and implement the generate() interface.
Extend:
backend/app/evaluation/pipeline.py
Add implementations in:
backend/app/synthesis/
| Provider | Example Model |
|---|---|
| OpenAI | openai:gpt-4o-mini |
| Gemini | gemini:2.5-flash |
| Mock | mock:echo, mock:pseudo |
Future improvements planned for PRISM include:
- reasoning trace comparison
- structured output evaluation
- weighted model voting
- dataset-level benchmarking
- experiment tracking
- evaluation dashboards
PRISM treats LLMs as parallel reasoning systems rather than single sources of truth.
The goal is not just to generate answers, but to understand how models reason, where they disagree, and how to combine their strengths.