Research
ICLR 2025 paper on unified routing + cascading shows consistent 4% improvement. Pattern: route to cheapest model first, run quality evaluator, escalate if below threshold. Complements existing Thompson Sampling and EMA routing.
Proposal
- Add cascade mode to router: try cheapest provider first
- Implement lightweight output quality classifier using self-learning data
- Escalate to next provider only if quality prediction below threshold
Sources
Research
ICLR 2025 paper on unified routing + cascading shows consistent 4% improvement. Pattern: route to cheapest model first, run quality evaluator, escalate if below threshold. Complements existing Thompson Sampling and EMA routing.
Proposal
Sources