Finding
BaRP: Bandit Routing with Preference-Tunable Trade-offs (arXiv:2510.07429)
Trains a bandit router under real deployment feedback (partial-feedback bandit). Operator can dial performance–cost trade-off at test time without retraining. Outperforms offline routers by ≥12.46%.
Applicability to Zeph
Zeph's LinUCB bandit router (zeph-llm/src/router/bandit.rs, PR #2390/#2230) is purely accuracy-focused. BaRP's preference-conditioned extension would allow operators to specify cost vs. quality weight at runtime — e.g., "prefer cheaper models in this session".
Proposed design:
[llm.router.bandit]
cost_weight = 0.3 # 0.0 = pure quality, 1.0 = pure cost
The LinUCB UCB formula adds cost_weight * cost_penalty(provider) to the exploration bonus, making expensive providers less attractive when cost_weight is high.
Implementation sketch
// In LinUCB arm selection:
let adjusted_ucb = quality_ucb - config.cost_weight * provider_cost_estimate(arm);
This is a minimal change to the existing bandit implementation and directly maps to the [cost] tracking already in Zeph.
Priority
P2 — extends existing LinUCB infrastructure with a high-value config knob; small implementation surface.
Source
- arXiv:2510.07429 — BaRP: Bandit Routing with Preference-Tunable Performance–Cost Trade-offs
Finding
BaRP: Bandit Routing with Preference-Tunable Trade-offs (arXiv:2510.07429)
Trains a bandit router under real deployment feedback (partial-feedback bandit). Operator can dial performance–cost trade-off at test time without retraining. Outperforms offline routers by ≥12.46%.
Applicability to Zeph
Zeph's LinUCB bandit router (
zeph-llm/src/router/bandit.rs, PR #2390/#2230) is purely accuracy-focused. BaRP's preference-conditioned extension would allow operators to specify cost vs. quality weight at runtime — e.g., "prefer cheaper models in this session".Proposed design:
The LinUCB UCB formula adds
cost_weight * cost_penalty(provider)to the exploration bonus, making expensive providers less attractive whencost_weightis high.Implementation sketch
This is a minimal change to the existing bandit implementation and directly maps to the
[cost]tracking already in Zeph.Priority
P2 — extends existing LinUCB infrastructure with a high-value config knob; small implementation surface.
Source