Test what agents actually understand.
For products developers and agents read. An open-source CLI that tests whether agents can answer and build with your product across context paths, scored by deterministic evidence. No LLM grades another LLM.
bun add -g @pickled-dev/cliOne config. Your real docs. Your real checks.
Drop a pickled.yml next to your sources. Declare the sources agents should use, the tasks they should complete, and how each one is checked. Whether agents reach your product through a public API, SDK docs, llms.txt, CLAUDE.md, AGENTS.md, JSDoc, or internal runbooks, pickled tests whether they can answer and build from the sources you declared. The example below is a public library.
# pickled.yml
schemaVersion: 2
product:
name: zod
description: TypeScript-first schema validation
sources:
llms: { url: https://zod.dev/llms.txt }
agents:
quick:
provider: claude-code
model: claude-haiku-4-5
contexts:
injected: { mode: inject, source: llms }
facts:
error_api:
statement: Errors are read with z.treeifyError.
match:
allOf: ["z.treeifyError"]
misstatements:
deprecated_format:
statement: Recommends the removed ZodError.format().
match:
anyOf: ["ZodError.format()"]
questions:
- id: error-handling
question: How do I get error messages from failed validation?
agents: [quick]
contexts: [injected]
expects: [error_api]
rejects: [deprecated_format]
examples:
pass: ["Read issues with z.treeifyError(err)."]
fail: ["Call ZodError.format() on the error."]
builds:
- id: add-validation
goal: Add Zod validation to the signup form.
agents: [quick]
contexts: [injected]
trials: 3
workspace: { path: ./fixtures/app }
verifier:
failToPass:
- { run: bun test }
passToPass:
- { run: bun run typecheck }
referenceSolution:
patch: ./fixtures/solutions/add-validation.patch
thresholds:
questions: 80
builds: 80A plausible answer can still be wrong, and a plausible edit can still fail verification. Pickled keeps both receipts deterministic.
Pickled runs locally. Runs in CI. Each run leaves a receipt you can diff and threshold. No dashboard required.
See the full exampleFind out what agents say about your product.
Open source. MIT. Install in 30 seconds. See your first score in two minutes.
bun add -g @pickled-dev/cliA pickle isn't fresh. A pickle is preserved. Same idea for your product context.