Getting started
Install pickled, write a pickled.yml, and run your first check.
Install
Pickled runs on Bun. Use bunx for one-off runs:
bunx @pickled-dev/cli initThis writes a starter pickled.yml in the current directory.
The smallest config
A config needs schemaVersion: 2, a product, at least one agent, one named context, and one question or build. A question carries at least one expects fact or rejects misstatement.
schemaVersion: 2
product:
name: my-product
description: short one-liner about what your product does
sources:
readme: { path: ./README.md }
agents:
quick:
provider: claude-code
model: claude-haiku-4-5
contexts:
from_readme: { mode: inject, source: readme }
facts:
install_command:
statement: my-product installs with bunx my-product.
match:
allOf: ["bunx my-product"]
questions:
- id: install
question: How do I install my-product?
agents: [quick]
contexts: [from_readme]
expects: [install_command]
thresholds:
questions: 80thresholds is optional. Set thresholds.questions to make pickled check pass or fail on the score; omit it to see the score with no verdict.
Preview the run
--plan walks the expansion and prints the cells without calling any model:
bunx @pickled-dev/cli check . --planRun a check
bunx @pickled-dev/cli check .Pickled spawns the agent, injects the registered source, asks the question, and scores the answer against the facts:
Task: How do I install my-product?
[quick · from_readme] ✓ Well grounded 1/1
Overall: 100 / 100 · threshold 80 · run passes
Every question met its checks.Save the receipt
Use --output when you want a durable receipt, then render it without rerunning the agent:
bunx @pickled-dev/cli check . --output pickled-report.json
bunx @pickled-dev/cli report pickled-report.json --format markdownDefault JSON is CI-safe. It keeps the verdicts and evidence ids, but strips source text, full agent answers, transcripts, diffs, and command output. Use --verbose for a forensic receipt.
Compare context paths
The point of pickled is the comparison. Add a memory context and run the same question down both:
contexts:
memory: { mode: memory } # model memory only
from_readme: { mode: inject, source: readme } # your README injected
questions:
- id: install
question: How do I install my-product?
agents: [quick]
contexts: [memory, from_readme]
expects: [install_command]Now pickled check . runs two cells. If memory fails and from_readme passes, your README is doing the work. If memory already passes, the model knew it without your docs. If from_readme fails, the answer is not in the source you registered: fix the context.
Reach the live site with tools
Add a web context to make the agent discover the answer instead of reading injected content:
contexts:
web_open: { mode: web }A web (or mcp) cell that answers without invoking a tool is vetoed to NO: model memory does not count as evidence for a tool path.
Score example answers offline
pickled test scores a question's examples.pass / examples.fail strings against its fact contract with zero model calls, so you can catch a brittle contract before a paid run:
bunx @pickled-dev/cli test .Builds
The facts above score an agent's answer. A build instead has the agent do the work: it edits a throwaway workspace, and you score the result with a verifier. Declare a workspace and verifier, and run it with pickled build:
builds:
- id: add-endpoint
goal: Add a health-check endpoint using my-product.
agents: [quick]
contexts: [from_readme]
trials: 3
workspace:
path: ./fixtures/app
setup: [bun install]
verifier:
failToPass:
- { run: bun test }
passToPass:
- { run: bun run typecheck }
referenceSolution:
patch: ./fixtures/solutions/add-endpoint.patchEach (agent × context) cell runs trials times in a fresh workspace; the result is k/n. Builds run on edit-capable CLI agents (claude-code, codex-cli). Use pickled build --verify-only to prove the fixture and reference patch before running an agent. See the pickled.yml reference for the full build contract.
Next steps
- Read the pickled.yml reference for every field.
- Wire pickled into CI with GitHub Actions.