adaptive concurrency: Filter configuration and integration tests#8244
adaptive concurrency: Filter configuration and integration tests#8244mattklein123 merged 40 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tallen@lyft.com>
|
/wait |
...e/extensions/filters/http/adaptive_concurrency/concurrency_controller/gradient_controller.cc
Outdated
Show resolved
Hide resolved
Signed-off-by: Tony Allen <tony@allen.gg>
...extensions/filters/http/adaptive_concurrency/adaptive_concurrency_filter_integration_test.cc
Outdated
Show resolved
Hide resolved
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
|
Marking as ready since the CI release test just flaked out. The |
| @@ -27,8 +27,8 @@ AdaptiveConcurrencyFilter::AdaptiveConcurrencyFilter( | |||
| Http::FilterHeadersStatus AdaptiveConcurrencyFilter::decodeHeaders(Http::HeaderMap&, bool) { | |||
| if (controller_->forwardingDecision() == ConcurrencyController::RequestForwardingAction::Block) { | |||
| // TODO (tonya11en): Remove filler words. | |||
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, awesome to see this coming together. A few nits, but my main comment is I would rather this be done with simulated time or at a minimum have a TODO explaining why it can't be and also do a de-flaking run either way. Thank you!
/wait
.../extensions/filters/http/adaptive_concurrency/adaptive_concurrency_filter_integration_test.h
Outdated
Show resolved
Hide resolved
.../extensions/filters/http/adaptive_concurrency/adaptive_concurrency_filter_integration_test.h
Outdated
Show resolved
Hide resolved
|
|
||
| void initializeFilter() { | ||
| // We use the fault filter (for delays) after the adaptive concurrency filter to introduce a | ||
| // "service latency" to the test. This way, time is moved forward with each response. |
There was a problem hiding this comment.
Why can't you just use simulated time and just advance time before upstream responds?
Wherever we end up on this, please verify that you have done a de-flaking run on these new tests (--runs_per_test=1000) to verify a lack of flakes.
...extensions/filters/http/adaptive_concurrency/adaptive_concurrency_filter_integration_test.cc
Outdated
Show resolved
Hide resolved
Signed-off-by: Tony Allen <tony@allen.gg>
|
/wait |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, looks good. You can fix the nits in your next change.
.../extensions/filters/http/adaptive_concurrency/adaptive_concurrency_filter_integration_test.h
Outdated
Show resolved
Hide resolved
.../extensions/filters/http/adaptive_concurrency/adaptive_concurrency_filter_integration_test.h
Outdated
Show resolved
Hide resolved
|
Please merge master. (Might as well fix the nits that I pointed out at the same time since you have to push again.) /wait |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks. @tonya11en please confirm that you have run the integration tests under --runs_per_test to check for flakes?
|
Not recently- let me go ahead and kick off 1000 and report back before merge. |
|
@mattklein123 looks stable: I needed to make sure there weren't multiple copies of this integration test running at the same time. When running it with 12 concurrency workers, there was a 2% failure rate, but from my investigations, I think this was due to the integration test framework and the fake upstreams. |
This means it will definitely flake in CI. It should work 100% of the time with concurrent workers, so we will need to investigate this unfortunately. /wait |
Signed-off-by: Tony Allen <tallen@lyft.com>
|
Alright, now we're cooking: 12 concurrent jobs this time. There were actually 2 races that needed to be fixed:
|
mattklein123
left a comment
There was a problem hiding this comment.
Awesome, thanks for fixing the flakes Tony!
…oyproxy#8244) Signed-off-by: Tony Allen <tallen@lyft.com>
This patch brings the concurrency controller and the adaptive concurrency control (ACC) filter together to make the filter fully useable with integration tests.
The integration tests make use of the fault filter, placed after the ACC filter, to emulate request latencies and to also move time forward. Even though this is against the recommendation in the documentation to have the filter come before all other filters, it was necessary to have last. Not every request makes it to the upstream due to occasionally being blocked by the ACC filter, so it was not possible to simply call
timeSystem().sleepand rely on the autonomous upstream to reply to the requests.Patch 3/n towards #7789.
Temporal behavior
Now that all the pieces are together, it's worth comparing the filter's behavior with what we expecte from the proof-of-concept. Time sampled stats show similar behavior to the proof-of-concept filter's data in #7789. Here's a plot of latencies without the adaptive concurrency filter:

Now with the adaptive concurrency filter:
