Introduction

Replace expensive LLM classification calls with fast ML surrogates trained on your own production traces.

What is TRACER?

TRACER (Trace-Based Adaptive Cost-Efficient Routing) is an open-source Python package that learns to route LLM classification traffic to cheap ML surrogates.

Most LLM classification pipelines use a large language model for every single input. In practice, the vast majority of that traffic is predictable: a lightweight ML model can match the LLM's output with near-perfect agreement.

TRACER learns the decision boundary between "easy" and "hard" inputs directly from your LLM's own classification traces. It fits a fast surrogate on the easy partition, gates it with a calibrated acceptor, and defers only the uncertain inputs back to the LLM.

Key features

Zero manual labeling: Every deferred LLM call produces a free labeled trace. No annotation pipeline needed. Parity-gated deployment: The surrogate goes live only when its teacher agreement exceeds your threshold on held-out data. If the task is too hard, the system says so. Continual learning: tracer.update() refits on new traces. Coverage grows automatically over time. Interpretability: Slice summaries, boundary pairs, and disagreement cards explain what the surrogate handles and why.

How it works

User query -> [Embedder] -> [ML Surrogate] -> [Acceptor Gate]
                                                |          |
                                            score >= t   score < t
                                                |          |
                                          Local answer   Defer to LLM

The surrogate is not another LLM. It is a classical ML model (logistic regression, random forest, or a small neural net) running on CPU in sub-millisecond time. This is what makes the cost reduction real.

Benchmark results

Metric Value
Coverage 92% of traffic handled locally
Teacher agreement 0.96 on handled traffic
Annual savings (10K queries/day) $16,244

Next steps