[model-gateway] Optimize router selection with lock-free snapshots#15672
[model-gateway] Optimize router selection with lock-free snapshots#15672slin1237 merged 13 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/gemini review |
|
Please remove the benchmark |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed performance optimization for the router selection logic. By leveraging arc_swap to implement a lock-free snapshot mechanism, you've successfully eliminated per-request heap allocations and lock contention on the hot path, leading to impressive performance gains as demonstrated by the new benchmarks. The code is clear, and the changes are logically sound. I have one minor suggestion to refactor a small piece of duplicated logic to enhance maintainability. Overall, this is an excellent improvement.
I have removed the file |
Motivation
The
RouterManager::select_router_for_requestfunction is on the critical hot path of the gateway, executed for every incoming inference request. The previous implementation suffered from two performance bottlenecks:collect::<Vec<_>>()andvec![router]triggered heap allocations on every request, leading to system memory allocator overhead and potential lock contention under high concurrency.DashMapentries requires acquiring internal shard-level locks. While efficient for point lookups, this added microsecond-scale latency and jitter during high-throughput routing.This PR introduces a lock-free snapshot mechanism using the
arc_swapcrate. By caching the router list in a contiguous, read-optimizedVecthat is updated only during registration, we achieve nanosecond-scale routing performance.Modifications
Snapshot Architecture: Integrated
routers_snapshot: ArcSwap<Vec<Arc<dyn RouterTrait>>>into theRouterManagerstruct to provide zero-allocation access to candidate routers.Atomic Updates: Modified
register_routerto pre-calculate and store a fresh snapshot whenever a new router joins the registry, shifting overhead from the "hot" request path to the "cold" registration path.Zero-Allocation Selection: Refactored
select_router_for_requestto use lock-free.load()calls.Logic Refinement:
Dependencies: Added
arc_swap = "1.7.1"toCargo.toml.Accuracy Tests
Benchmarking and Profiling
benchmarks were conducted ($1\mu s = 1,000ns$ ). The results show a dramatic reduction in routing overhead.
Checklist