Getting started

Install

Pickled runs on Bun. Use bunx for one-off runs:

bunx @pickled-dev/cli init

This writes a starter pickled.yml in the current directory.

The smallest config

A config needs schemaVersion: 2, a product, at least one agent, one named context, and one question or build. A question carries at least one expects fact or rejects misstatement.

schemaVersion: 2

product:
  name: my-product
  description: short one-liner about what your product does

sources:
  readme: { path: ./README.md }

agents:
  quick:
    provider: claude-code
    model: claude-haiku-4-5

contexts:
  from_readme: { mode: inject, source: readme }

facts:
  install_command:
    statement: my-product installs with bunx my-product.
    match:
      allOf: ["bunx my-product"]

questions:
  - id: install
    question: How do I install my-product?
    agents: [quick]
    contexts: [from_readme]
    expects: [install_command]

thresholds:
  questions: 80

thresholds is optional. Set thresholds.questions to make pickled check pass or fail on the score; omit it to see the score with no verdict.

Preview the run

--plan walks the expansion and prints the cells without calling any model:

bunx @pickled-dev/cli check . --plan

Run a check

bunx @pickled-dev/cli check .

Pickled spawns the agent, injects the registered source, asks the question, and scores the answer against the facts:

Task: How do I install my-product?
  [quick · from_readme] ✓ Well grounded 1/1

Overall: 100 / 100 · threshold 80 · run passes
Every question met its checks.

Save the receipt

Use --output when you want a durable receipt, then render it without rerunning the agent:

bunx @pickled-dev/cli check . --output pickled-report.json
bunx @pickled-dev/cli report pickled-report.json --format markdown

Default JSON is CI-safe. It keeps the verdicts and evidence ids, but strips source text, full agent answers, transcripts, diffs, and command output. Use --verbose for a forensic receipt.

Compare context paths

The point of pickled is the comparison. Add a memory context and run the same question down both:

contexts:
  memory: { mode: memory } # model memory only
  from_readme: { mode: inject, source: readme } # your README injected

questions:
  - id: install
    question: How do I install my-product?
    agents: [quick]
    contexts: [memory, from_readme]
    expects: [install_command]

Now pickled check . runs two cells. If memory fails and from_readme passes, your README is doing the work. If memory already passes, the model knew it without your docs. If from_readme fails, the answer is not in the source you registered: fix the context.

Reach the live site with tools

Add a web context to make the agent discover the answer instead of reading injected content:

contexts:
  web_open: { mode: web }

A web (or mcp) cell that answers without invoking a tool is vetoed to NO: model memory does not count as evidence for a tool path.

Score example answers offline

pickled test scores a question's examples.pass / examples.fail strings against its fact contract with zero model calls, so you can catch a brittle contract before a paid run:

bunx @pickled-dev/cli test .

Builds

The facts above score an agent's answer. A build instead has the agent do the work: it edits a throwaway workspace, and you score the result with a verifier. Declare a workspace and verifier, and run it with pickled build:

builds:
  - id: add-endpoint
    goal: Add a health-check endpoint using my-product.
    agents: [quick]
    contexts: [from_readme]
    trials: 3
    workspace:
      path: ./fixtures/app
      setup: [bun install]
    verifier:
      failToPass:
        - { run: bun test }
      passToPass:
        - { run: bun run typecheck }
    referenceSolution:
      patch: ./fixtures/solutions/add-endpoint.patch

Each (agent × context) cell runs trials times in a fresh workspace; the result is k/n. Builds run on edit-capable CLI agents (claude-code, codex-cli). Use pickled build --verify-only to prove the fixture and reference patch before running an agent. See the pickled.yml reference for the full build contract.

Next steps

Read the pickled.yml reference for every field.
Wire pickled into CI with GitHub Actions.

On this page