zeroeval-skills

Agent skills for integrating ZeroEval into AI applications. Works with Cursor, Claude Code, Codex, and 30+ other coding agents.

Skills included

zeroeval-install -- Get from zero to production-ready. Walks through SDK install (Python or TypeScript), first trace, ze.prompt migration, and starter judge recommendations. Routes to custom-tracing for non-SDK languages.

custom-tracing -- Send traces to ZeroEval without the SDK. Covers direct REST API ingestion (POST /spans) and OpenTelemetry (OTLP) export for any language.

create-judge -- Design and create automated judges. Covers binary vs scored evaluation, template writing, criteria design, and the full creation API.

prompt-migration -- Migrate hardcoded prompts to ze.prompt. Covers the full migration workflow, feedback wiring, judge linkage, staged rollout, and prompt optimization for both Python and TypeScript.

LLM Stats Benchmark Hub

manage-data -- Create, push, version, and manage benchmark datasets for the LLM Stats benchmark hub. Covers the full dataset lifecycle: Python SDK, CSV, git-based workflows, subsets, versioning, and multimodal data.

run-evals -- Write tasks, evaluations, and scoring pipelines for LLM Stats benchmarks. Covers @ze.task, dataset.eval(), row/column/run evaluators, column_map, signals, execution config, repeat/resume, and CLI inspection.

Which skill do I need?

I want to...	Use
Add ZeroEval to my project for the first time	zeroeval-install
Set up tracing and observability (Python or TypeScript)	zeroeval-install
Send traces without the SDK (Go, Ruby, Java, etc.)	custom-tracing
Send traces via REST API or OpenTelemetry (OTLP)	custom-tracing
Migrate hardcoded prompts to ze.prompt	prompt-migration
Wire feedback collection for prompt optimization	prompt-migration
Connect judges to a prompt for automated evaluation	prompt-migration
Understand the staged rollout (explicit / auto / latest)	prompt-migration
Get recommendations for which judges to create	zeroeval-install
Create a new judge with a custom template	create-judge
Design scoring criteria for multi-dimensional evaluation	create-judge
Understand the judge creation API	create-judge
Add data to an LLM Stats benchmark	manage-data
Push datasets via git or Python SDK	manage-data
Work with dataset versions and subsets	manage-data
Add multimodal data (images, audio, video)	manage-data
Write a `@ze.task` and run evals on a benchmark	run-evals
Score eval results with evaluators and `column_map`	run-evals
Emit signals during task execution	run-evals
Configure workers, retries, checkpoints, resume	run-evals
Inspect eval status, errors, and traces via CLI	run-evals

Install

Option 1: CLI install (recommended)

# Install all skills
npx skills add zeroeval/zeroeval-skills

# Install a specific skill
npx skills add zeroeval/zeroeval-skills --skill zeroeval-install

# List available skills
npx skills add zeroeval/zeroeval-skills --list

Option 2: Claude Code plugin

# Add the marketplace
/plugin marketplace add zeroeval/zeroeval-skills

# Install a specific plugin
/plugin install zeroeval-install@zeroeval-skills
/plugin install custom-tracing@zeroeval-skills
/plugin install create-judge@zeroeval-skills
/plugin install prompt-migration@zeroeval-skills
/plugin install manage-data@zeroeval-skills
/plugin install run-evals@zeroeval-skills

# Reload plugins if the new commands do not appear immediately
/reload-plugins

Claude Code plugin skills are namespaced by plugin name. After installing from the marketplace, invoke them as:

/zeroeval-install:zeroeval-install
/custom-tracing:custom-tracing
/create-judge:create-judge
/prompt-migration:prompt-migration
/manage-data:manage-data
/run-evals:run-evals

Option 3: Clone and copy

git clone https://github.com/zeroeval/zeroeval-skills.git
mkdir -p .agents/skills
cp -r zeroeval-skills/skills/* .agents/skills/

Or copy individual skills:

# Cursor
mkdir -p .cursor/skills
cp -r zeroeval-skills/skills/zeroeval-install .cursor/skills/zeroeval-install
cp -r zeroeval-skills/skills/custom-tracing .cursor/skills/custom-tracing
cp -r zeroeval-skills/skills/create-judge .cursor/skills/create-judge
cp -r zeroeval-skills/skills/prompt-migration .cursor/skills/prompt-migration
cp -r zeroeval-skills/skills/manage-data .cursor/skills/manage-data
cp -r zeroeval-skills/skills/run-evals .cursor/skills/run-evals

# Claude Code
mkdir -p .claude/skills
cp -r zeroeval-skills/skills/zeroeval-install .claude/skills/zeroeval-install
cp -r zeroeval-skills/skills/custom-tracing .claude/skills/custom-tracing
cp -r zeroeval-skills/skills/create-judge .claude/skills/create-judge
cp -r zeroeval-skills/skills/prompt-migration .claude/skills/prompt-migration
cp -r zeroeval-skills/skills/manage-data .claude/skills/manage-data
cp -r zeroeval-skills/skills/run-evals .claude/skills/run-evals

Note: on Windows without symlink support, use npx skills or the manual copy method. The plugins/ directory contains symlinks that may not resolve on Windows.

Repo structure

zeroeval-skills/
├── .claude-plugin/
│   └── marketplace.json               # Claude Code plugin marketplace
├── README.md
├── skills/                             # Canonical skill content (npx skills + manual copy)
│   ├── zeroeval-install/
│   │   ├── SKILL.md
│   │   └── references/
│   │       ├── python-integration-playbook.md
│   │       ├── typescript-integration-playbook.md
│   │       ├── judges-playbook.md
│   │       └── troubleshooting.md
│   ├── custom-tracing/
│   │   ├── SKILL.md
│   │   └── references/
│   │       └── api-integration-playbook.md
│   ├── create-judge/
│   │   ├── SKILL.md
│   │   └── references/
│   │       ├── judge-creation-guide.md
│   │       ├── evaluation-patterns.md
│   │       └── defaults-and-api-reference.md
│   ├── prompt-migration/
│   │   ├── SKILL.md
│   │   └── references/
│   │       ├── python-prompt-migration-playbook.md
│   │       └── typescript-prompt-migration-playbook.md
│   ├── manage-data/                    # LLM Stats benchmark hub
│   │   ├── SKILL.md
│   │   └── references/
│   │       ├── dataset-lifecycle.md
│   │       └── git-workflow.md
│   └── run-evals/                      # LLM Stats benchmark hub
│       ├── SKILL.md
│       └── references/
│           ├── task-and-eval-patterns.md
│           ├── execution-config.md
│           └── end-to-end-examples.md
└── plugins/                            # Claude Code plugin wrappers (symlinked)
    ├── zeroeval-install/
    ├── custom-tracing/
    ├── create-judge/
    ├── prompt-migration/
    ├── manage-data/
    └── run-evals/

Requirements

A ZeroEval account and API key -- zeroeval.com
SDK path: Python 3.8+ or Node 18+, plus an LLM provider SDK (OpenAI, Vercel AI, LangChain, etc.)
Direct API / OTLP path: any language with an HTTP client or OpenTelemetry exporter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zeroeval-skills

Skills included

LLM Stats Benchmark Hub

Which skill do I need?

Install

Option 1: CLI install (recommended)

Option 2: Claude Code plugin

Option 3: Clone and copy

Repo structure

Requirements

About

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude-plugin		.claude-plugin
plugins		plugins
skills		skills
.DS_Store		.DS_Store
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

zeroeval-skills

Skills included

LLM Stats Benchmark Hub

Which skill do I need?

Install

Option 1: CLI install (recommended)

Option 2: Claude Code plugin

Option 3: Clone and copy

Repo structure

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!