The ZeroEval CLI ships with the Python SDK and gives you terminal access to monitoring, prompts, judges, and optimization. It is designed to be agent-friendly and works well in CI pipelines, scripted workflows, and agent toolchains.
Setup
Run the interactive setup to save your API key and resolve your project:
This opens the ZeroEval dashboard, prompts for your API key, saves it to your shell config (e.g. ~/.zshrc, ~/.bashrc), and links your project automatically.
For non-interactive environments like CI or coding agents, use auth set instead: zeroeval auth set --api-key-env ZEROEVAL_API_KEY
Auth commands
zeroeval auth set --api-key < ke y > # Set API key directly
zeroeval auth set --api-key-env MY_KEY_VAR # Read API key from an env var
zeroeval auth set --api-base-url < ur l > # Override API base URL
zeroeval auth show --redact # Show current config (key masked)
zeroeval auth clear # Wipe all stored config
zeroeval auth clear --api-key-only # Clear only the API key and project
Auth resolution
The CLI resolves credentials in this order:
Explicit CLI flags (--project-id, --api-base-url)
Environment variables (ZEROEVAL_API_KEY, ZEROEVAL_PROJECT_ID, ZEROEVAL_BASE_URL)
Global CLI config file
Config file location:
macOS / Linux : ~/.config/zeroeval/config.json (or $XDG_CONFIG_HOME/zeroeval/config.json)
Windows : %APPDATA%/zeroeval/config.json
Global flags
Global flags must appear before the subcommand:
zeroeval --output json --project-id < i d > --timeout 30.0 traces list
Flag Default Description --output text|jsontextOutput format. json emits stable JSON to stdout, errors to stderr. --project-idenv / config Project context. Required for monitoring, prompts, judges, and optimization commands. --api-base-urlhttps://api.zeroeval.comOverride the API URL. --quietoff Suppress non-essential output. --timeout20.0HTTP request timeout in seconds.
Output modes
text (default) — human-readable; dict/list payloads are pretty-printed as JSON.
json — stable, machine-readable JSON to stdout. Errors go to stderr as structured JSON. Confirmation prompts (e.g. optimize promote) are auto-skipped in JSON mode.
Exit codes
Code Meaning 0Success 2User / validation error 3Auth or permission error 4Remote API or network error
Querying and filtering
Most list and get commands support --where, --select, and --order for client-side filtering, projection, and sorting.
--where
Filter rows. Repeatable — multiple clauses are AND-ed.
# Exact match
zeroeval judges list --where "name=Quality Check"
# Substring match (case-insensitive)
zeroeval traces list --where "status~completed"
# Set membership
zeroeval spans list --where 'kind in ["llm","tool"]'
Supported operators: = (exact), ~ (substring), in (JSON array).
--select
Project specific fields. Comma-separated, supports dotted paths.
zeroeval judges list --select "id,name,evaluation_type"
--order
Sort results by a field. Defaults to ascending.
zeroeval traces list --order "created_at:desc"
Monitoring
Monitoring commands require a --project-id (resolved automatically after zeroeval setup).
Sessions
zeroeval sessions list --start-date 2025-01-01 --end-date 2025-02-01 --limit 50
zeroeval sessions get < session_i d >
Flag Description --start-dateISO date string lower bound --end-dateISO date string upper bound --limitMax results (default 50) --offsetPagination offset (default 0)
Traces
zeroeval traces list --start-date 2025-01-01 --limit 50
zeroeval traces get < trace_i d >
zeroeval traces spans < trace_i d > --limit 100
traces spans returns the spans belonging to a specific trace, useful for debugging individual requests.
Spans
zeroeval spans list --start-date 2025-01-01 --limit 50
zeroeval spans get < span_i d >
Prompts
List and inspect
zeroeval prompts list
zeroeval prompts get < prompt_slu g >
zeroeval prompts get < prompt_slu g > --version 3
zeroeval prompts get < prompt_slu g > --tag production
zeroeval prompts versions < prompt_slu g >
zeroeval prompts tags < prompt_slu g >
Submit feedback
Provide feedback on a prompt completion for DSPy optimization and prompt tuning:
zeroeval prompts feedback create \
--prompt-slug customer-support \
--completion-id < uui d > \
--thumbs-up \
--reason "Clear and helpful response"
For scored judges, add judge-specific fields:
zeroeval prompts feedback create \
--prompt-slug customer-support \
--completion-id < uui d > \
--thumbs-down \
--judge-id < judge_uui d > \
--expected-score 3.5 \
--score-direction too_high \
--reason "Score should be lower"
--thumbs-up and --thumbs-down are mutually exclusive and one is required.
Judges
List and inspect
zeroeval judges list
zeroeval judges get < judge_i d >
zeroeval judges criteria < judge_i d >
zeroeval judges evaluations < judge_i d > --limit 100
zeroeval judges insights < judge_i d >
zeroeval judges performance < judge_i d >
zeroeval judges calibration < judge_i d >
zeroeval judges versions < judge_i d >
Filter evaluations
zeroeval judges evaluations < judge_i d > \
--start-date 2025-01-01 \
--end-date 2025-02-01 \
--evaluation-result true \
--feedback-state pending \
--limit 200
Flag Description --evaluation-resulttrue or false--feedback-stateFilter by feedback state --start-date / --end-dateDate range
Create a judge
zeroeval judges create \
--name "Tone Check" \
--prompt "Evaluate whether the response maintains a professional tone." \
--evaluation-type binary \
--sample-rate 1.0 \
--temperature 0.0
Or load the prompt from a file:
zeroeval judges create \
--name "Quality Scorer" \
--prompt-file judge_prompt.txt \
--evaluation-type scored \
--score-min 0 \
--score-max 10 \
--pass-threshold 7
Flag Default Description --namerequired Judge name --prompt— Inline prompt text (mutually exclusive with --prompt-file) --prompt-file— Path to file containing the prompt --evaluation-typebinarybinary or scored--score-min0.0Minimum score (scored only) --score-max10.0Maximum score (scored only) --pass-threshold— Pass threshold (scored only) --sample-rate1.0Fraction of spans to evaluate --backfill100Number of existing spans to backfill --tag— Tag filter in key=value1,value2 format. Repeatable. --tag-matchallall or any--target-prompt-id— Scope judge to a specific prompt --temperature0.0LLM temperature for judge
Submit judge feedback
zeroeval judges feedback create \
--span-id < span_uui d > \
--thumbs-down \
--reason "Missed safety issue" \
--expected-output "Should flag harmful content"
For scored judges with per-criterion feedback:
zeroeval judges feedback create \
--span-id < span_uui d > \
--thumbs-down \
--expected-score 2.0 \
--score-direction too_high \
--criteria-feedback '{"clarity": {"expected_score": 1.0, "reason": "Confusing response"}}'
Optimization
Start, inspect, and promote prompt or judge optimization runs. All optimization commands require --project-id.
Prompt optimization
zeroeval optimize prompt list < task_i d >
zeroeval optimize prompt get < task_i d > < run_i d >
zeroeval optimize prompt start < task_i d > --optimizer-type quick_refine
zeroeval optimize prompt promote < task_i d > < run_i d > --yes
Judge optimization
zeroeval optimize judge list < judge_i d >
zeroeval optimize judge get < judge_i d > < run_i d >
zeroeval optimize judge start < judge_i d > --optimizer-type dspy_bootstrap
zeroeval optimize judge promote < judge_i d > < run_i d > --yes
Flag Default Description --optimizer-typequick_refinequick_refine, dspy_bootstrap, or dspy_gepa--config— JSON string of extra optimizer configuration --yesoff Skip the confirmation prompt (also skipped in --output json mode)
Spec (machine-readable manual)
The spec commands dump the CLI’s command and parameter contract as JSON or Markdown, useful for agents and toolchains that need to discover available commands programmatically.
zeroeval spec cli --format json
zeroeval spec command "judges create" --format markdown
CI / automation recipes
Get the latest traces as JSON
zeroeval --output json traces list --limit 10 --order "created_at:desc"
Check judge pass rate
zeroeval --output json judges evaluations < judge_i d > \
--evaluation-result true --limit 1000 \
--select "id" | jq length
zeroeval --output json optimize prompt promote < task_i d > < run_i d > --yes
Tracing quickstart Get your first trace in under 5 minutes
Judges How calibrated judges evaluate your production traffic
Prompt setup Add ze.prompt() to your Python or TypeScript codebase
Skills Let your coding agent handle SDK install and judge setup