Zero-dependency load balancing for LLM agent calls — part of the arsenal collection.
- RoundRobin — even distribution across endpoints
- WeightedRandom — probabilistic selection by weight
- LeastConnections — always route to least-busy endpoint
- Health tracking — per-endpoint success/failure stats, latency, error rate
- Automatic failover — unhealthy endpoints are bypassed; auto-recovery after timeout
- Thread-safe — concurrent calls handled correctly
- Zero dependencies — stdlib only
pip install agent-balancerOr install from source:
git clone https://github.com/darshjme/agent-balancer
cd agent-balancer
pip install -e .from agent_balancer import Endpoint, RoundRobinBalancer, WeightedRandomBalancer, LeastConnectionsBalancer
# Define your LLM endpoints
endpoints = [
Endpoint("openai-us", "https://api.openai.com/v1", weight=2.0),
Endpoint("openai-eu", "https://api.openai.eu/v1", weight=1.0),
Endpoint("anthropic", "https://api.anthropic.com/v1", weight=1.5),
]
# Round Robin
lb = RoundRobinBalancer(endpoints)
ep = lb.next() # get next endpoint
try:
result = call_llm(ep.url, prompt)
lb.release(ep, success=True, latency_ms=320)
except Exception:
lb.release(ep, success=False)
# Weighted Random
lb = WeightedRandomBalancer(endpoints)
# Least Connections (best for concurrent workloads)
lb = LeastConnectionsBalancer(endpoints)# Endpoints auto-marked unhealthy after max_failures consecutive failures
ep = Endpoint("fragile", "http://flaky-api.com", max_failures=2, recovery_timeout=30.0)
# Check stats
print(lb.stats())
# {
# "openai-us": {"status": "healthy", "active_connections": 3,
# "success_count": 1200, "failure_count": 5,
# "avg_latency_ms": 340.2, "error_rate": 0.004},
# ...
# }
# Force status
ep.force_unhealthy() # manual circuit-break
ep.force_healthy() # manual recovery| Attribute | Description |
|---|---|
is_healthy |
True if HEALTHY or DEGRADED |
status |
HealthStatus.HEALTHY / DEGRADED / UNHEALTHY |
active_connections |
Current in-flight calls |
stats.avg_latency_ms |
Rolling average latency |
stats.error_rate |
Fraction of failed calls |
| Class | Strategy |
|---|---|
RoundRobinBalancer |
Cycles through endpoints in order |
WeightedRandomBalancer |
Random pick weighted by endpoint.weight |
LeastConnectionsBalancer |
Always picks least-busy endpoint |
All balancers share the same interface:
lb.next()→Endpointlb.release(ep, success, latency_ms)lb.add_endpoint(ep)/lb.remove_endpoint(name)lb.healthy_endpoints→List[Endpoint]lb.stats()→Dict
pip install pytest
pytest tests/ -v35 tests covering all strategies, edge cases, and thread safety.
MIT © Darshankumar Joshi