-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Adaptive Concurrency Control HTTP Filter
TL; DR:
L7 filter to dynamically update request concurrency values based on sampled request latencies.
Motivation
At Lyft, we’re using Envoy’s circuit breaking system to defend our services from load caused by too many concurrent requests. The circuit breakers currently require service owners to manually set limits for each service/upstream cluster. While the feature is beneficial, the service owners are expected to understand the ideal concurrency limits of their services, configure them, and update the limits as their service evolves. This is not a problem if the concurrency values are always tuned properly, but the inevitable stale values have caused several incidents in production.
This work aims to remove the need for manual tuning of these circuit breaking parameters and keep latencies low by using techniques from Netflix’s concurrency limits Java library.
Planned Patches
The filter will be committed in stages to make the review process easier. Here's an ordered list of what is currently planned:
- L7 filter with calls to a noop concurrency control framework.
- Implement concurrency control framework and unit tests.
- Make parameters runtime configurable.
- Integration tests.
- Documentation.
Nice-to-haves:
- Benchmark.
Preliminary Data
To test the various concurrency control algorithms, I used bufferbloater (whose functionality will eventually be added to Nighthawk) and a proof-of-concept implementation in my envoy-filter-example fork.
Identical tests were performed on a system with sane CB settings configured and adaptive concurrency configured. The client RPS to the server remained constant, but the server's response times varied every 30 seconds.
Circuit breaker configured
Adaptive concurrency control configured
So far, the results seem promising. The proof-of-concept filter was able to replicate the circuit breaker behavior, but did so with latencies that were <= the CB test latencies. All while not requiring any manual concurrency configurations.

