Record and deterministically replay AI agent executions.
Litmus captures every LLM and tool call your agent makes, saving structured trace files you can inspect, share, and replay.
pip install litmus-trace# Record your agent (wraps the process, captures all LLM calls)
litmus run python my_agent.py
# View the trace
litmus view ./traces/lt-abc123.trace.jsonYour agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime.
Record — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file. API keys are automatically redacted.
View — Pretty-print traces with step-by-step details, latency, and model info.
Replay — Feed recorded responses back to your agent. Same code path, same output, no real API calls.
Fault Injection — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out?
CI Gating — Score your trace corpus for reliability and block deploys that drop below a threshold.
Join the Discord to get notified when these features launch.
litmus run python my_agent.pyimport litmus
litmus.record()
# ... your existing agent code, unchanged ...
litmus.stop()litmus proxy --mode record
# Then point your SDK:
ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.pyWorks with any LLM API out of the box:
| Provider | Status |
|---|---|
| Anthropic (Claude) | Tested |
| OpenAI (GPT) | Tested |
| Google (Gemini) | Supported |
| Mistral | Supported |
| Cohere | Supported |
| Groq | Supported |
| Together AI | Supported |
| Fireworks AI | Supported |
| DeepSeek | Supported |
| Perplexity | Supported |
| OpenRouter | Supported |
| Ollama (local) | Supported |
| vLLM (local) | Supported |
| LM Studio (local) | Supported |
Custom/self-hosted models:
litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1litmus run Wrap a command to record (zero code changes)
litmus view Pretty-print a trace file
litmus proxy Start the recording proxy server
litmus providers List all supported providers
litmus replay Replay a trace (coming soon — requires Litmus Cloud)
litmus ci Score traces and gate deploys (coming soon — requires Litmus Cloud)
Litmus monkey-patches the httpx transport layer used by both Anthropic and OpenAI Python SDKs. When you call client.messages.create(...), Litmus intercepts the HTTP request before it leaves your machine.
Record mode: The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted.
Replay mode: The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason.
- API keys (
Authorization,x-api-key) are automatically redacted from trace headers - Use
--compactto strip request bodies for smaller trace files - Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts
- Python only — the monkey-patch approach (
litmus run,litmus.record()) requires Python. Use proxy mode for other languages. - httpx-based SDKs — works with SDKs that use
httpxunder the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs usingrequestsoraiohttpare not intercepted. - Sequential replay — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses.
- No tool call recording — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded.
- Discord — fastest way to get help, share traces, and request features
- GitHub Issues — bug reports and feature requests
- PyPI — package
I'm building Litmus in the open and I want to hear from you — whether it's a bug, a feature idea, or just telling me about your agent setup. I personally respond to everything.
- Email: romirj@gmail.com
- Discord: romirj (join the server)
- Twitter/X: @romir_jain
If you're running agents in production and want to use Litmus, I'll personally help you set it up. DM me anywhere.
Observability tools (LangSmith, Langfuse) tell you what happened. They log traces.
Litmus captures the full picture. Every LLM call, every response, every token — in a structured trace file you can inspect, share, and (soon) replay deterministically with fault injection.
LangSmith is the dashcam. Litmus is building the crash test facility.
MIT
