Introduction
Replace expensive LLM classification calls with fast ML surrogates trained on your own production traces.
What is TRACER?
TRACER (Trace-Based Adaptive Cost-Efficient Routing) is an open-source Python package that learns to route LLM classification traffic to cheap ML surrogates.
Most LLM classification pipelines use a large language model for every single input. In practice, the vast majority of that traffic is predictable: a lightweight ML model can match the LLM's output with near-perfect agreement.
TRACER learns the decision boundary between "easy" and "hard" inputs directly from your LLM's own classification traces. It fits a fast surrogate on the easy partition, gates it with a calibrated acceptor, and defers only the uncertain inputs back to the LLM.
Key features
Zero manual labeling: Every deferred LLM call produces a free labeled trace. No annotation pipeline needed.
Parity-gated deployment: The surrogate goes live only when its teacher agreement exceeds your threshold on held-out data. If the task is too hard, the system says so.
Continual learning: tracer.update() refits on new traces. Coverage grows automatically over time.
Interpretability: Slice summaries, boundary pairs, and disagreement cards explain what the surrogate handles and why.
How it works
User query -> [Embedder] -> [ML Surrogate] -> [Acceptor Gate]
| |
score >= t score < t
| |
Local answer Defer to LLM
The surrogate is not another LLM. It is a classical ML model (logistic regression, random forest, or a small neural net) running on CPU in sub-millisecond time. This is what makes the cost reduction real.
Benchmark results
| Metric | Value |
|---|---|
| Coverage | 92% of traffic handled locally |
| Teacher agreement | 0.96 on handled traffic |
| Annual savings (10K queries/day) | $16,244 |
Next steps
- Quickstart Install and run your first routing policy in 5 minutes.
- Concepts Understand the pipeline, model zoo, acceptor, and parity gate.