Skip to content

darshjme/agent-balancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-balancer

Zero-dependency load balancing for LLM agent calls — part of the arsenal collection.

Python 3.9+ License: MIT Zero Dependencies

Features

  • RoundRobin — even distribution across endpoints
  • WeightedRandom — probabilistic selection by weight
  • LeastConnections — always route to least-busy endpoint
  • Health tracking — per-endpoint success/failure stats, latency, error rate
  • Automatic failover — unhealthy endpoints are bypassed; auto-recovery after timeout
  • Thread-safe — concurrent calls handled correctly
  • Zero dependencies — stdlib only

Installation

pip install agent-balancer

Or install from source:

git clone https://github.com/darshjme/agent-balancer
cd agent-balancer
pip install -e .

Quick Start

from agent_balancer import Endpoint, RoundRobinBalancer, WeightedRandomBalancer, LeastConnectionsBalancer

# Define your LLM endpoints
endpoints = [
    Endpoint("openai-us",   "https://api.openai.com/v1",   weight=2.0),
    Endpoint("openai-eu",   "https://api.openai.eu/v1",    weight=1.0),
    Endpoint("anthropic",   "https://api.anthropic.com/v1", weight=1.5),
]

# Round Robin
lb = RoundRobinBalancer(endpoints)
ep = lb.next()           # get next endpoint
try:
    result = call_llm(ep.url, prompt)
    lb.release(ep, success=True, latency_ms=320)
except Exception:
    lb.release(ep, success=False)

# Weighted Random
lb = WeightedRandomBalancer(endpoints)

# Least Connections (best for concurrent workloads)
lb = LeastConnectionsBalancer(endpoints)

Health & Failover

# Endpoints auto-marked unhealthy after max_failures consecutive failures
ep = Endpoint("fragile", "http://flaky-api.com", max_failures=2, recovery_timeout=30.0)

# Check stats
print(lb.stats())
# {
#   "openai-us": {"status": "healthy", "active_connections": 3,
#                 "success_count": 1200, "failure_count": 5,
#                 "avg_latency_ms": 340.2, "error_rate": 0.004},
#   ...
# }

# Force status
ep.force_unhealthy()   # manual circuit-break
ep.force_healthy()     # manual recovery

API Reference

Endpoint(name, url, weight=1.0, max_failures=3, recovery_timeout=60.0, metadata={})

Attribute Description
is_healthy True if HEALTHY or DEGRADED
status HealthStatus.HEALTHY / DEGRADED / UNHEALTHY
active_connections Current in-flight calls
stats.avg_latency_ms Rolling average latency
stats.error_rate Fraction of failed calls

Balancers

Class Strategy
RoundRobinBalancer Cycles through endpoints in order
WeightedRandomBalancer Random pick weighted by endpoint.weight
LeastConnectionsBalancer Always picks least-busy endpoint

All balancers share the same interface:

  • lb.next()Endpoint
  • lb.release(ep, success, latency_ms)
  • lb.add_endpoint(ep) / lb.remove_endpoint(name)
  • lb.healthy_endpointsList[Endpoint]
  • lb.stats()Dict

Tests

pip install pytest
pytest tests/ -v

35 tests covering all strategies, edge cases, and thread safety.

License

MIT © Darshankumar Joshi

About

Zero-dep Python lib for LLM agent load balancing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages