Test how well LLM agents use your MCP tools, compare different models, and track quality over time with automated testing and detailed reports.
Rich visual reports, detailed traces, and interactive dashboards.








Track pass rates, latency trends and recent runs at a glance.
Up and running in under a minute.
1. Install
2. Create eval config
servers:
my-server:
transport: "http"
url: "http://localhost:3000/mcp"
agents:
claude:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
temperature: 0
scenarios:
- id: "basic-test"
agent: "claude"
servers: ["my-server"]
prompt: "Use the tools to complete this task..."
eval:
tool_constraints:
required_tools: ["my_tool"]
response_assertions:
- type: "regex"
pattern: "success|completed" 3. Run evaluation
Built-in AI assistants to supercharge your workflow.
AI chat to help design and refine evaluation scenarios. Describe what you want to test and get ready-to-use YAML configurations.
AI chat to analyze and explain completed run results. Understand failures, spot patterns, and get actionable improvement suggestions.
Automated review of your MCP tool definitions for quality, safety, and LLM-friendliness. Get recommendations before testing.
Documentation
The docs are split into focused pages so you can jump straight to installation, config, usage, or the app workflow.