Skip to content

modelguide/modelguide

Repository files navigation

Model Guide

Own your agent stack.

ModelGuide is the open-source orchestration layer for production voice-first agents.
Keep your runtime. Wire up integrations once. Define agent behavior with playbooks, SOPs, and guardrails.
Build → generate tests → simulate → score → improve → ship. A closed feedback loop you own.

No vendor lock-in. Bring your own models, runtimes, channels, and deployment.

Start with a reference implementation → LiveKit · Pipecat · ElevenLabs · Mastra

MIT License  CI Status  CLA assistant

Quick Start · Reference Implementations · Connect Your Agent · Admin Guide · Build a Connector · Roadmap

ModelGuide Demo

The Missing Feedback Loop

Getting an agent to talk is easy. Making it reliable is the hard part.

A bad conversation happens. Someone reviews it manually. A prompt gets tweaked. But no reusable test is created, no eval is added, and the same failure comes back later in a slightly different form.

The missing layer is the feedback loop around the runtime: business tool access, policy enforcement, session history, QA workflows, evals, provisioning, and deployment.

ModelGuide gives you that layer as open source — so you can turn failures into tests, tests into better instructions, and ship voice agents on any stack without rebuilding production infrastructure from scratch. Start with voice. Extend to other customer-facing channels when needed.

Architecture diagram

What ModelGuide Is

sop_demo.mp4

ModelGuide sits between your agent runtime and your business systems. It is not a voice runtime and it is not a hosted black box. It is the orchestration layer you own.

  • Connect business systems once over MCP
  • Assign the right tools to each agent with confirmation gates and secure credentials
  • Compile SOPs and guardrails into agent behavior
  • Record sessions with transcripts, tool traces, CSAT, and QA tags
  • Run evals and simulations against real workflows
  • Provision new organizations from repeatable YAML blueprints

Why Builders Use ModelGuide

Builder need What ModelGuide gives you
Closed feedback loop Run simulations and evals, turn failed conversations into reusable test cases and evaluators, and recompile better instructions
Less production glue code Connect tools, sessions, SOPs, evals, and operator workflows without rebuilding the harness around every runtime
Runtime portability Keep LiveKit, Pipecat, ElevenLabs, Mastra, or your own runtime. The business layer stays portable.
One place for agent context Manage tools, SOPs, guardrails, confirmation policies, and review workflows from a single control layer
Reviewable behavior Full session records, tool traces, CSAT, QA tags, and eval results — complements your observability stack
Self-hostable production infrastructure Open-source, self-hostable, with multi-tenant auth, encrypted secrets, and row-level security

ModelGuide focuses on agent behavior and review: transcripts, tool traces, CSAT, QA tags, SOP adherence, and eval results. Keep Langfuse, Datadog, Honeycomb, or OpenTelemetry for lower-level runtime telemetry and infrastructure tracing.

Connect Tools Review Conversations Define Behavior
Connect Tools Review Conversations Define Behavior
Write Playbooks Track Quality Run Evals
Write Playbooks Track Quality Run Evals

Quick Start

Prerequisites: Docker 24+, Bun 1.1+, Node 22+

git clone https://github.com/modelguide/modelguide.git
cd modelguide
make quickstart

Then in separate terminals:

make api-dev    # API at http://localhost:3000
make ui-dev     # Dashboard at http://localhost:3001

Open http://localhost:3001. The seed creates three industry-vertical organizations — retail, medical call center, B2B industrial — each with Medusa e-commerce and Zendesk helpdesk connectors, two agents, and ~300 realistic sessions. Log in with delivered+admin-glowbox@resend.dev (magic link printed to API console).

Full vertical matrix, dev accounts, and session scenarios: docs/guide/seed-data.md.

How Teams Use ModelGuide

1. Define what your agent should do. Describe the persona, connect your business systems, set the rules and guardrails. ModelGuide keeps that operational context in one place.

2. Generate the instructions your runtime uses. ModelGuide compiles that context into agent instructions and exposes the approved business tools over MCP.

3. Generate test assets automatically. ModelGuide creates synthetic conversations, eval suites, evaluators, and QA workflows to test the agent before it reaches production traffic.

4. Run the feedback loop. ModelGuide runs simulations, scores behavior, and gives your team transcripts, tool traces, CSAT, QA tags, and eval results to review.

5. Tighten the operating context. Use failures to update SOPs, guardrails, persona, tools, and compiled instructions until the automated checks consistently look right.

6. Validate manually before launch. Once the agent passes the automated checks, run manual tests in your runtime and confirm the experience is good enough to ship.

The closed feedback loop is already here: define the context, compile the instructions, generate tests, run simulations, score behavior, and improve the agent from failures. Over time, more of the prompt and context fixes can be automated.

Reference Implementations

The reference implementations prove that the orchestration layer stays portable across runtimes and channels.

Start with the LiveKit implementation for the fastest end-to-end path. Use the Pipecat or ElevenLabs examples if your team already runs there. The Mastra example shows the same orchestration layer extending beyond voice when you need another customer-facing channel.

Runtime Why it exists Path
LiveKit Agents (flagship) Fastest path to a production voice agent with telephony, MCP tool wiring, session tracking, eval tests, and deployment docs examples/agents/livekit-agent/
Pipecat Same orchestration model for teams already committed to Pipecat examples/agents/pipecat-agent/
ElevenLabs Conversational AI Manage platform agent config, tools, and prompts from version-controlled local definitions examples/agents/elevenlabs-agent/
Mastra Email "Where Is My Order?" example showing the orchestration layer extends beyond voice when you need another customer-facing channel examples/agents/mastra-wismo-email-agent/

Provisioning an Organization

The mg CLI provisions a new organization from a directory of YAML files — users, connectors, agents with compiled instructions, SOPs, guardrails, and demo sessions — in one command. Safe to re-run against the same directory.

bun run src/cli/mg.ts setup /path/to/my-org/

Full flag reference, per-command usage, and Railway instructions: docs/guide/cli.md.

Roadmap

🚧 Sub-agents & Workflow Builder — Compose multi-step agent workflows with branching and handoffs

🚧 OTEL + A/B Testing via Langfuse — OpenTelemetry traces, prompt variant experiments, side-by-side comparison

🚧 Agentic Insights — Custom funnels tracking agent behavior through business-defined conversion paths

🚧 Closed-loop instruction tuning — turn repeated eval and simulation failures into suggested SOP, guardrail, and instruction fixes

📋 More Blueprints — Contact center ships first; healthcare intake, field service, B2B sales next

📋 Connector Marketplace — Community-built integrations

Deployment

Docker Compose for local and staging (make docker-up), Railway for production. The Railway architecture is PostgreSQL + API + UI + Caddy load balancer (the LB is the only public-facing service, routing /api/* and /mcp to the API and everything else to the UI over Railway's internal network). Config is as-code via railway.toml per service — full setup and deploy steps in railway/DEPLOY.md.

Tech Stack

Layer Technology
API Hono + Bun.js
Agent Protocol MCP (@modelcontextprotocol/sdk)
Database PostgreSQL 16 + Drizzle ORM
Dashboard TanStack Start + React 19 + Tailwind CSS v4
Auth JWT + magic links (users) · API keys (agents)
API Docs Scalar (auto-generated from OpenAPI)

No proprietary components. Every layer is inspectable, replaceable, forkable.

Production foundations include RBAC with separate admin/support/agent auth paths, encrypted secrets, row-level security, and a full CI pipeline running lint, typecheck, unit, integration, and MCP-protocol tests on every PR. See ADR-005 for the SOP primitive, ADR-007 and ADR-009 for the evals engine.

Documentation

Resource Description
MCP Integration Guide Connect your AI agent via MCP
Admin Guide Configure connectors, agents, and tools through the dashboard
Adding a Connector Build a new connector manifest, handlers, and tests
mg CLI — Provisioning Provision organizations from YAML
Seed Data Dev accounts, orgs, and session scenarios
Architecture Decisions ADRs for significant design choices
Deployment Guide Railway production deployment
Contributing Setup, workflow, project structure, conventions

Contributing

Contributions welcome. No CLA. See CONTRIBUTING.md for the full guide.

# Run checks before submitting
make api-test          # Unit + integration tests
make ui-test           # UI component tests
make api-lint-check    # Linting
make api-typecheck     # Type checking

Check open issues — look for good first issue. Fork → branch → PR with tests.

License

MIT


Built by ModelGuide · The open-source orchestration framework for production voice-first agents · 🇵🇱 Poland

About

Open-source voice agent orchestration framework - build production voice AI pipelines without vendor lock-in

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages