pickled

Getting started

Install pickled, write a pickled.yml, and run your first check.

Install

Pickled runs on Bun. Use bunx for one-off runs:

bunx @pickled-dev/cli init

This writes a starter pickled.yml in the current directory.

The smallest config

A config needs schemaVersion: 2, a product, at least one agent, one named context, and one question or build. A question carries at least one expects fact or rejects misstatement.

schemaVersion: 2

product:
  name: my-product
  description: short one-liner about what your product does

sources:
  readme: { path: ./README.md }

agents:
  quick:
    provider: claude-code
    model: claude-haiku-4-5

contexts:
  from_readme: { mode: inject, source: readme }

facts:
  install_command:
    statement: my-product installs with bunx my-product.
    match:
      allOf: ["bunx my-product"]

questions:
  - id: install
    question: How do I install my-product?
    agents: [quick]
    contexts: [from_readme]
    expects: [install_command]

thresholds:
  questions: 80

thresholds is optional. Set thresholds.questions to make pickled check pass or fail on the score; omit it to see the score with no verdict.

Preview the run

--plan walks the expansion and prints the cells without calling any model:

bunx @pickled-dev/cli check . --plan

Run a check

bunx @pickled-dev/cli check .

Pickled spawns the agent, injects the registered source, asks the question, and scores the answer against the facts:

Task: How do I install my-product?
  [quick · from_readme] ✓ Well grounded 1/1

Overall: 100 / 100 · threshold 80 · run passes
Every question met its checks.

Save the receipt

Use --output when you want a durable receipt, then render it without rerunning the agent:

bunx @pickled-dev/cli check . --output pickled-report.json
bunx @pickled-dev/cli report pickled-report.json --format markdown

Default JSON is CI-safe. It keeps the verdicts and evidence ids, but strips source text, full agent answers, transcripts, diffs, and command output. Use --verbose for a forensic receipt.

Compare context paths

The point of pickled is the comparison. Add a memory context and run the same question down both:

contexts:
  memory: { mode: memory } # model memory only
  from_readme: { mode: inject, source: readme } # your README injected

questions:
  - id: install
    question: How do I install my-product?
    agents: [quick]
    contexts: [memory, from_readme]
    expects: [install_command]

Now pickled check . runs two cells. If memory fails and from_readme passes, your README is doing the work. If memory already passes, the model knew it without your docs. If from_readme fails, the answer is not in the source you registered: fix the context.

Reach the live site with tools

Add a web context to make the agent discover the answer instead of reading injected content:

contexts:
  web_open: { mode: web }

A web (or mcp) cell that answers without invoking a tool is vetoed to NO: model memory does not count as evidence for a tool path.

Score example answers offline

pickled test scores a question's examples.pass / examples.fail strings against its fact contract with zero model calls, so you can catch a brittle contract before a paid run:

bunx @pickled-dev/cli test .

Builds

The facts above score an agent's answer. A build instead has the agent do the work: it edits a throwaway workspace, and you score the result with a verifier. Declare a workspace and verifier, and run it with pickled build:

builds:
  - id: add-endpoint
    goal: Add a health-check endpoint using my-product.
    agents: [quick]
    contexts: [from_readme]
    trials: 3
    workspace:
      path: ./fixtures/app
      setup: [bun install]
    verifier:
      failToPass:
        - { run: bun test }
      passToPass:
        - { run: bun run typecheck }
    referenceSolution:
      patch: ./fixtures/solutions/add-endpoint.patch

Each (agent × context) cell runs trials times in a fresh workspace; the result is k/n. Builds run on edit-capable CLI agents (claude-code, codex-cli). Use pickled build --verify-only to prove the fixture and reference patch before running an agent. See the pickled.yml reference for the full build contract.

Next steps

On this page