Skip to content

rylinjames/litmus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Litmus

Record and deterministically replay AI agent executions.

Litmus demo — record, replay, fault inject

Litmus captures every LLM and tool call your agent makes, saving structured trace files you can inspect, share, and replay.

pip install litmus-trace

Quick Start — Zero Code Changes

# Record your agent (wraps the process, captures all LLM calls)
litmus run python my_agent.py

# View the trace
litmus view ./traces/lt-abc123.trace.json

Your agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime.

What It Does

Free (works offline, no account needed)

Record — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file. API keys are automatically redacted.

View — Pretty-print traces with step-by-step details, latency, and model info.

Coming Soon (Litmus Cloud)

Replay — Feed recorded responses back to your agent. Same code path, same output, no real API calls.

Fault Injection — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out?

CI Gating — Score your trace corpus for reliability and block deploys that drop below a threshold.

Join the Discord to get notified when these features launch.

Three Ways to Record

1. CLI Wrapper (recommended — zero code changes)

litmus run python my_agent.py

2. One-Line Python API

import litmus

litmus.record()
# ... your existing agent code, unchanged ...
litmus.stop()

3. Proxy Mode (any language, advanced use)

litmus proxy --mode record
# Then point your SDK:
ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.py

Supported Providers

Works with any LLM API out of the box:

Provider Status
Anthropic (Claude) Tested
OpenAI (GPT) Tested
Google (Gemini) Supported
Mistral Supported
Cohere Supported
Groq Supported
Together AI Supported
Fireworks AI Supported
DeepSeek Supported
Perplexity Supported
OpenRouter Supported
Ollama (local) Supported
vLLM (local) Supported
LM Studio (local) Supported

Custom/self-hosted models:

litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1

CLI Reference

litmus run          Wrap a command to record (zero code changes)
litmus view         Pretty-print a trace file
litmus proxy        Start the recording proxy server
litmus providers    List all supported providers
litmus replay       Replay a trace (coming soon — requires Litmus Cloud)
litmus ci           Score traces and gate deploys (coming soon — requires Litmus Cloud)

How It Works

Litmus monkey-patches the httpx transport layer used by both Anthropic and OpenAI Python SDKs. When you call client.messages.create(...), Litmus intercepts the HTTP request before it leaves your machine.

Record mode: The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted.

Replay mode: The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason.

Security

  • API keys (Authorization, x-api-key) are automatically redacted from trace headers
  • Use --compact to strip request bodies for smaller trace files
  • Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts

Limitations

  • Python only — the monkey-patch approach (litmus run, litmus.record()) requires Python. Use proxy mode for other languages.
  • httpx-based SDKs — works with SDKs that use httpx under the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs using requests or aiohttp are not intercepted.
  • Sequential replay — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses.
  • No tool call recording — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded.

Community

  • Discord — fastest way to get help, share traces, and request features
  • GitHub Issues — bug reports and feature requests
  • PyPI — package

Talk to Me

I'm building Litmus in the open and I want to hear from you — whether it's a bug, a feature idea, or just telling me about your agent setup. I personally respond to everything.

If you're running agents in production and want to use Litmus, I'll personally help you set it up. DM me anywhere.

Why Litmus?

Observability tools (LangSmith, Langfuse) tell you what happened. They log traces.

Litmus captures the full picture. Every LLM call, every response, every token — in a structured trace file you can inspect, share, and (soon) replay deterministically with fault injection.

LangSmith is the dashcam. Litmus is building the crash test facility.

License

MIT

About

Record and deterministically replay AI agent executions. Flight recorder for LLM agents. Fault injection, reliability scoring, CI gating.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages