Skip to content

Adaptive Concurrency Control L7 Filter #7789

@tonya11en

Description

@tonya11en

Adaptive Concurrency Control HTTP Filter

TL; DR:
L7 filter to dynamically update request concurrency values based on sampled request latencies.

Motivation

At Lyft, we’re using Envoy’s circuit breaking system to defend our services from load caused by too many concurrent requests. The circuit breakers currently require service owners to manually set limits for each service/upstream cluster. While the feature is beneficial, the service owners are expected to understand the ideal concurrency limits of their services, configure them, and update the limits as their service evolves. This is not a problem if the concurrency values are always tuned properly, but the inevitable stale values have caused several incidents in production.

This work aims to remove the need for manual tuning of these circuit breaking parameters and keep latencies low by using techniques from Netflix’s concurrency limits Java library.

Planned Patches

The filter will be committed in stages to make the review process easier. Here's an ordered list of what is currently planned:

  • L7 filter with calls to a noop concurrency control framework.
  • Implement concurrency control framework and unit tests.
  • Make parameters runtime configurable.
  • Integration tests.
  • Documentation.

Nice-to-haves:

  • Benchmark.

Preliminary Data

To test the various concurrency control algorithms, I used bufferbloater (whose functionality will eventually be added to Nighthawk) and a proof-of-concept implementation in my envoy-filter-example fork.

Identical tests were performed on a system with sane CB settings configured and adaptive concurrency configured. The client RPS to the server remained constant, but the server's response times varied every 30 seconds.

Circuit breaker configured

realistic_cb_well_configured

Adaptive concurrency control configured

realistic_server_degrade

So far, the results seem promising. The proof-of-concept filter was able to replicate the circuit breaker behavior, but did so with latencies that were <= the CB test latencies. All while not requiring any manual concurrency configurations.

Metadata

Metadata

Assignees

Labels

enhancementFeature requests. Not bugs or questions.no stalebotDisables stalebot from closing an issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions