[WIP] Admission Control Filter#10230

Closed

tonya11en wants to merge 64 commits intoenvoyproxy:masterfrom

tonya11en:admctl

Member

tonya11en commented Mar 3, 2020 •

edited

Loading

This patch is a starts progress toward #9658. It is still a work in progress, but I'm opening a draft PR because I'd like to to discuss the general approach before writing integration tests, documentation, etc.

A quick summary of the motivation and origin of the idea can be found in the description of #9658.

The filter tracks request success rate for each worker thread and uses the information to reject requests with some probability dictated by the SR (in some rolling window) before forwarding to the upstream.This removes the need for any locks such as those found in the local ratelimit filter and the adaptive concurrency filter.

The per-thread success rate calculation is performed by tracking the total request count and the total number of successes. These values are accumulated each second and inserted into a deque that contains the per-second accumulated values for the entire rolling window. This allows us to efficiently phase out stale SR data as it is no longer in the time window.

There are a few items I'd appreciate some preliminary feedback on. Once there's a consensus, I'll write up documentation and integration tests. The two major items I'd like any feedback/comments on are:

What would be the best way to configure the definition of a "successful" request? I thought something similar to the retry_on categories would do the trick, but this might not work for grpc.
Could there be workloads that could see potential problems with this per-thread SR accounting approach?

Tony Allen and others added 30 commits

January 10, 2020 15:13

wip

991d39f

Signed-off-by: Tony Allen <tallen@lyft.com>

wip

84c1b51

Signed-off-by: Tony Allen <tallen@lyft.com>

wip

d3e80e0

Signed-off-by: Tony Allen <tony@allen.gg>

wip

668040d

Signed-off-by: Tony Allen <tony@allen.gg>


          still broken

3727a6d

Signed-off-by: Tony Allen <tony@allen.gg>


          builds wip

b91335f

Signed-off-by: Tony Allen <tallen@lyft.com>


          stats

871fe70

Signed-off-by: Tony Allen <tony@allen.gg>


          thread local

cd1d879

Signed-off-by: Tony Allen <tony@allen.gg>


          format

b768ad1

Signed-off-by: Tony Allen <tony@allen.gg>


          tests

7a4678e

Signed-off-by: Tony Allen <tony@allen.gg>


          Fix bugs and more tests.

d5278b8

Signed-off-by: Tony Allen <tony@allen.gg>


          runtime double

34144fd

Signed-off-by: Tony Allen <tony@allen.gg>


          runtime double

f3ec7f7

Signed-off-by: Tony Allen <tony@allen.gg>


          filter config test

5e3d1ed

Signed-off-by: Tony Allen <tony@allen.gg>

wip

5654fa2

Signed-off-by: Tony Allen <tony@allen.gg>


          compiles, wip, test fails

10472c0

Signed-off-by: Tony Allen <tony@allen.gg>


          tests pass

a15ad6e

Signed-off-by: Tony Allen <tony@allen.gg>


          filter disable test

7f30f79

Signed-off-by: Tony Allen <tony@allen.gg>


          Merge remote-tracking branch 'upstream/master' into admctl

fa3cd1c


          format and fix merge stuff

29478b0

Signed-off-by: Tony Allen <tony@allen.gg>


          more filter tests

6b69513

Signed-off-by: Tony Allen <tony@allen.gg>


          more filter tests

48043b6

Signed-off-by: Tony Allen <tony@allen.gg>


          test stats

d0b8c50

Signed-off-by: Tony Allen <tony@allen.gg>

wip

5358a78

Signed-off-by: Tony Allen <tony@allen.gg>


          Merge remote-tracking branch 'upstream/master' into admctl

6bf90d1

Signed-off-by: Tony Allen <tony@allen.gg>


          format

fbecc7f

Signed-off-by: Tony Allen <tony@allen.gg>

wip

c31da84

Signed-off-by: Tony Allen <tony@allen.gg>


          doc: fix SNI FAQ link (envoyproxy#10227)

a9b9f8b

Signed-off-by: Lizan Zhou <lizan@tetrate.io>


          builds

e01eda5

Signed-off-by: Tony Allen <tallen@lyft.com>

wip

5e97619

Signed-off-by: Tony Allen <tony@allen.gg>

Member Author

tonya11en commented Mar 13, 2020

/wait-any

repokitteh-read-only bot added the waiting:any label

tonya11en added 2 commits

March 13, 2020 13:24


          error bars

6b893f6

Signed-off-by: Tony Allen <tony@allen.gg>


          make test less flaky

1379ac5

Signed-off-by: Tony Allen <tony@allen.gg>

repokitteh-read-only bot removed the waiting:any label

Member Author

tonya11en commented Mar 13, 2020

Basic HTTP integration test is implemented. Since we're validating probabilities, I ensured it wasn't a flaky test:

± % btd //test/extensions/filters/http/admission_control:admission_control_integration_test --runs_per_test=1000
WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/home/tallen/src/envoy/tools/bazel.rc
INFO: Invocation ID: c4325847-2684-473e-8c0a-7ac2ce6ae1c8
INFO: Analyzed target //test/extensions/filters/http/admission_control:admission_control_integration_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //test/extensions/filters/http/admission_control:admission_control_integration_test up-to-date:
  bazel-bin/test/extensions/filters/http/admission_control/admission_control_integration_test
INFO: Elapsed time: 1012.032s, Critical Path: 49.45s
INFO: 1003 processes: 1 remote cache hit, 1002 linux-sandbox.
INFO: Build completed successfully, 1004 total actions
//test/extensions/filters/http/admission_control:admission_control_integration_test PASSED in 24.8s
  Stats over 1000 runs: max = 24.8s, min = 18.4s, avg = 23.5s, dev = 0.7s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line oINFO: Build completed successfully, 1004 total actions


          minor test changes

Signed-off-by: Tony Allen <tony@allen.gg>

mattklein123 added the waiting label

stale bot commented Mar 20, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

stale bot added the stale label

Member Author

tonya11en commented Mar 20, 2020

Actively wrestling with GRPC integration test.

stale bot removed the stale label

tonya11en added 3 commits

March 25, 2020 18:23

wip

9cb2499

Signed-off-by: Tony Allen <tony@allen.gg>


          Fix the grpc integration test.

80c219a

Signed-off-by: Tony Allen <tony@allen.gg>


          fix format

43d479b

Signed-off-by: Tony Allen <tony@allen.gg>

repokitteh-read-only bot removed the waiting label

Member Author

tonya11en commented Mar 26, 2020

Sorry for the delay- it turns out that when sending the local reply 503 in the filter, the GRPC status is not set. This caused the integration client to observe the upstream probabilistically return HTTP 503s with no GRPC status 50% of the time, or HTTP 200 with the “Unknown” GRPC status in the trailer.

I'm foregoing documentation until the general behavior of this filter has ossified. Once that happens, I'll take this PR out of draft along with pushing a bunch of docs.

mattklein123 added the waiting label


          Kick CI

20f6036

Signed-off-by: Tony Allen <tony@allen.gg>

repokitteh-read-only bot removed the waiting label

Member Author

tonya11en commented Mar 31, 2020

@mattklein123 after you take your first pass through the code, I'll pull this PR out of draft and remove the WIP assuming no major architectural changes are required.

tonya11en commented

View reviewed changes

source/extensions/filters/http/admission_control/admission_control.h

+              private:
+                std::vector<std::function<bool(uint64_t)>> http_success_fns_;
+                std::unordered_set<uint64_t> grpc_success_codes_;

Member Author

tonya11en Mar 31, 2020

It just occurred to me that there are a handful of gRPC success codes. It'll be more performant to just do a sequential search over a vector instead of hashing in an unordered set.

I'll make that change next round.

mattklein123 requested changes

View reviewed changes

Member

mattklein123 left a comment

Thanks I left a few comments. In general this looks great but I take back what I said about a single PR. Can we split this into at least 1) the ancillary changes, 2) the controller, and 3) the filter?

Thank you!

/wait

api/envoy/api/v2/core/base.proto

                 string runtime_key = 3 [(validate.rules).string = {min_bytes: 1}];
               }
+              // Runtime derived double with a default when not specified.

Member

mattklein123 Apr 1, 2020

Since this PR is so large please split this and associated tests into a different PR (if there is anything else that can be split out please do so)

api/envoy/config/filter/http/admission_control/v2alpha/admission_control.proto

+                // values.
+                message DefaultSuccessCriteria {
+                  // If HTTP statuses are unspecified, defaults to 2xx.
+                  repeated HttpStatusRange http_status = 1;

Member

mattklein123 Apr 1, 2020

I think I would make this an actual numeric range for flexibility. See Int32Range or similar as something you might potentially use.

api/envoy/config/filter/http/admission_control/v2alpha/admission_control.proto

+                api.v2.core.RuntimeFeatureFlag enabled = 1;
+                // The time window over which the success rate is calculated. The window is rounded to the nearest
+                // second. Defaults to 120s.

Member

mattklein123 Apr 1, 2020

Bump still needs more verbiage to talk about sliding, etc.

api/envoy/type/grpc/v2/grpc_status.proto

+              // GRPC status.
+              message GrpcStatus {
+                // Supplies GRPC response code.
+                Status status = 1 [(validate.rules).enum = {defined_only: true not_in: 0}];

Member

mattklein123 Apr 1, 2020

What does the not_in 0 do?

api/envoy/type/grpc/v2/grpc_status.proto

+              // [#protodoc-title: GRPC status codes]
+              // GRPC response codes supported.
+              enum Status {

Member

mattklein123 Apr 1, 2020

@lizan does this exist anywhere else? Any thoughts on defining this in our API?

Member

lizan Apr 2, 2020

No this doesn't exist. I don't think we should define this as enum but just as int32 (that's what google.rpc.Status does) because in data plane there will be custom defined error codes as well.

source/extensions/filters/http/admission_control/admission_control.h

+               * The look-back window for request samples is accurate up to a hard-coded 1-second granularity.
+               * TODO (tonya11en): Allow the granularity to be configurable.
+               */
+              class ThreadLocalController {

Member

mattklein123 Apr 1, 2020

Yeah given the size of this PR if possible maybe split the controller out into its own PR? If it's a big deal I can do in one review but it would be nice to not have so much code to look at.

repokitteh-read-only bot added the waiting label

Member

htuch commented Apr 7, 2020

Please merge master to pick up #10672. We no longer accept changes to v2 (without explicit exception), so any API modifications should happen in v3. If this PR is adding a new proto, please follow the updated instructions in https://github.com/envoyproxy/envoy/blob/master/api/STYLE.md#adding-an-extension-configuration-to-the-api.

stale bot commented Apr 14, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

stale bot added the stale label

stale bot commented Apr 25, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

stale bot closed this

tonya11en mentioned this pull request

Admission Control Filter #10985

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api stale waiting