Add ability to filter ConfigDump. by paul-r-gall · Pull Request #16774 · envoyproxy/envoy

paul-r-gall · 2021-06-02T15:25:58Z

Signed-off-by: Paul Gallagher pgal@google.com

Commit Message: admin: Add filtering and matching code to ConfigDump.
Additional Description: Config Dump callbacks now expect to receive a StringMatcher parameter, which they should apply to names before adding configs to the dump.
Risk Level: Low
Testing: unit testing, integration testing for new matcher usage.
Docs Changes: Documentation updated with instructions and example usage for name_regex query parameter.
Release Notes:
Platform Specific Features:

Signed-off-by: Paul Gallagher <pgal@google.com>

junr03 · 2021-06-02T16:49:16Z

Hi @paul-r-gall I noticed this PR is in Draft. Would you like me to assign a maintainer to start the code review process, or wait until you mark ready for review?

/wait-any

paul-r-gall · 2021-06-02T17:21:59Z

@junr03 Yes, that would be great to get a first pass from someone who can LGTM the approach, at which point I can finish up the rest of the testing.

zuercher

I'm a little confused by the approach. It seems to me that the implementation of each ConfigMatchDumper is closely tied to the code that registers the ConfigTracker callback. Taking ListenerManagerImpl as an example, what's the advantage of using the factory-style lookup vs just adding an isMatch method (in an unnamed namespace) that performs the necessary checks on the matching params and protobuf object?

If we are to stick with the factory-style lookup, I don't think we need to separate the factory and the matcher itself -- they can be combined into just the matcher type. UntypedFactory isn't necessary either -- these aren't extensions with their own configuration types. Finally, I think we'd want to use a bit of template code to allow implementors to just write a single match method that takes the concrete protobuf type and matching parameters. That is name() and the generic match can be templatized in a base class, leaving the implementor to write (again using Listener as the example) bool match(const envoy::config::listener::v3::Listener&, const Matcher::ConfigDump::MatchingParameters&). But unless I've misunderstood something, I think just implementing the filtering in the callback implementations will be easier to understand.

paul-r-gall · 2021-06-03T17:26:40Z

Thanks for the feedback! I'm new to Envoy, so really appreciate the comments.

The advantage of factory-style lookup is that it allows end users to customize the matching however they like. For example, using Listener again, if there's some_specific_parameter buried deep inside the externally populated metadata field that the user wants to filter by, there's no way for a standard isMatch method to know about where this parameter lives, or how to match against it.

With regards to not separating the Factory and the Matcher, I agree, probably unnecessary, and a consequence of me not yet fully getting Envoy style yet.

In terms of templating, I would be fine with that approach rather then registering all separate matchers. Just to make sure that I understand, integrations would make template specializations for whatever types needed custom matching (Listener, RouteConfiguration, etc).

zuercher · 2021-06-03T19:42:30Z

The advantage of factory-style lookup is that it allows end users to customize the matching however they like. For example, using Listener again, if there's some_specific_parameter buried deep inside the externally populated metadata field that the user wants to filter by, there's no way for a standard isMatch method to know about where this parameter lives, or how to match against it.

I don't propose a single isMatch that would work for all the object types, I'm really just questioning whether a factory lookup is necessary vs. just writing an isMatch method for each type that can be dumped.

For the metadata case, since the matching is always controlled by the DumpMatcher object registered for the protobuf type (e.g. Listener or Cluster), that DumpMatcher implementation would need to extract the Metadata object from Listener (or Cluster) and then compare that with the match params. You could implement that by making another call to look up a DumpMatcher for Metadata, but you could also just write a utility method that matched Metadata and call it directly.

I took the DumpMatcher implementation in the listener test as an example of what the real implementation would look like, but maybe that's too trivial an example?

paul-r-gall · 2021-06-03T20:45:28Z

So, the scenario that I'm thinking of is closer to the following more complicated metadata case:

An Envoy integration has an external proto some.external.Proto, that is one of the fields inside of the metadata field of Cluster. So a cluster proto will look like:

name: cluster_name
// non-metadata values
metadata {
  filter_metadata {
    key: "some.external.Proto"
    value {
      fields {
        key: "field_of_interest"
        value { string_value: "important_value" }
      }
      // other fields may be here
    }
    // other filter_metadata may be here
  }
}

Ideally, the external user would be able to send something like /config_dump?match_some_external_proto_field_of_interest=important_value which only returns clusters which have metadata that looks like the above proto. So they'd write their Cluster specialization of match as follows:

bool match(const envoy::config::cluster::v3::Cluster& cluster,
           const MatchingParameters& params) override {
  bool is_match = true;
  if (params.contains("some_external_proto_field_of_interest")) {
    // parse cluster.metadata["some.external.Proto"] into an instance of some.external.Proto
    is_match &= parsed_proto.field_of_interest() == params["some_external_proto_field_of_interest"]
  }
  // other matching logic w/ other params
  return is_match;
}

Whereas, if all this were to be done on the Envoy side, there'd need to be an Envoy function like

bool isMatch(const envoy::config::core::v3::Metadata& metadata,
             const MatchingParameters& params) {
  // somehow parse/match params and metadata in such a way that's flexible enough to cover every
  // scenario that users can come up with.
}

Admittedly, this is an extreme case of complexity, but I think I would prefer to provide the end user with as much matching flexibility as possible.

As a non metadata example, similar flexibility would be needed to filter fragment information from a ScopedRouteConfiguration.

paul-r-gall · 2021-06-04T16:26:09Z

Another thought about why I used name-based factory lookup instead of templates -- the name-based factory lookup allows tests to inject resource-specific behavior, whereas with a template approach all matching behavior for a resource type is fully decided at compile time.

zuercher · 2021-06-04T17:46:33Z

So they'd write their Cluster specialization of match as follows

It seems like you want to be able to extend Envoy with custom implementations that might override built-in matchers. In that case, if the registered name is just the type name (e.g. "envoy.config.cluster.v3.Cluster") there's an initialization order problem. Envoy's registry system makes use of static initializers. This works because we expect each extension to register a unique name (which is [enforced by the code[(https://github.com/envoyproxy/envoy/blob/main/envoy/registry/registry.h#L216)). We could relax that check, but C++ offers no guarantees about whether the default implementation for Cluster would be registered before or after a custom one. The ordering could easily change from build-to-build, depending on the compiler. You could have custom filtering extensions and request one using a URL parameter, (e.g. config_dump?filter_ext=envoy.admin_filters.my_custom_filter&filter=... but that doesn't feel like a very natural extension point for Envoy.

Honestly, I think you could make use of descriptor and message to implement simple filtering in a generic way. I agree that coming up with a way to encode arbitrarily complex queries would be difficult, but I'm also not convinced that should be the goal. I expect most use cases for this are "name=x" or "metadata.filter_metadata.com.namespace.custom.key=value" and combinations via logical and. I think that's that's mostly just a matter of deciding the right way to delimit and escape proto field names, map keys, and array indexes and writing the filtering code. All of which could live in source/server/admin. Since the output of config dump is JSON-encoded protobuf objects, I don't think it's too much to expect people to provide their own scripting for complex cases.

paul-r-gall · 2021-06-04T19:00:13Z

I was not actually planning on creating any default matchers, so there wouldn't be a name conflict.

CPU profiling indicates that about 35% of the CPU time in handlerConfigDump is spent in API_RECOVER_ORIGINAL and Any::PackFrom, so moving filtering to source/server/admin won't fix that issue (as all protos will still have API_RECOVER_ORIGINAL and Any::PackFrom called on them). Additionally, another ~40% of time is spent in redact, and an additional 10% is spent in getJsonStringFromMessage, so expecting integrations to do their own filtering for complex cases after the JSON is returned also doesn't solve the CPU time issue (which, along with memory usage, is the main reason for this PR).

Based on the complexity of the code in trimResourceMessage to apply a FieldMask to config dump, and the various TODOs in there, I'm skeptical that it'd be easy to do simple filtering in a generic way.

All that said, if you think it would be better to just do really simple filtering/matching by name (maybe w/ a regex or something), and leave the more complicated material for a future PR to allow more discussion and consideration, I would be fine with that (and it would satisfy the current Google-internal requirements).

zuercher · 2021-06-07T23:46:25Z

Thanks for taking the time to provide those measurements. I can see why you want to push the filtering down as far as possible.

Historically, we've not provided extension points that aren't used at all from within Envoy. That said, I think we could have a relatively simple default implementation (maybe something that only filters on name and simple metadata) and a way to configure Admin to use a different implementation, provided as an extension. I think this provides similar performance to what's in the PR now: the filtering occurs as each object is retrieved and therefore avoids the expensive operations for objects that are filtered out.

The differences are:

Instead of looking the extension up by the type of object being dumped, we'd add a config_dump_filter field to Admin. (Feel free to pick a better name.) The type would be a message with the usual name/typed_config fields that we use for specifying extensions. If unset, the default would be used, and your filter can be provided as an extension.
Rather than passing MatchParameters into the callback, you'd lookup the extension factory, construct a filter implementation from the factory by passing the query param(s), and then pass the filter into the callback.
It's probably easiest if the filter implementations are expected to handle all the various object types (and ignore unknown types), rather than having a separate filter implementation for each type. This leaves open the possibility of someone writing a completely generic filter, but doesn't require it. You can just check the message type, cast and examine fields directly.

I think that addresses my concerns and leaves you free to implement private filtering code that does what you need without worrying about other operators might want.

Signed-off-by: Paul Gallagher <pgal@google.com>

This reverts commit 818ec8d. Signed-off-by: Paul Gallagher <pgal@google.com>

Signed-off-by: Paul Gallagher <pgal@google.com>

repokitteh-read-only · 2021-06-09T23:25:45Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #16774 was synchronize by paul-r-gall.

see: more, trace.

paul-r-gall · 2021-06-09T23:27:40Z

Added new commit implementing @zuercher's suggestions. @htuch please let me know what you think.

Signed-off-by: Paul Gallagher <pgal@google.com>

zuercher · 2021-06-15T00:01:46Z

Thanks! Yes, that looks like the correct approach to me. In order to make sure there's consensus, I would like still like to have another maintainer chime in and then I'll do a closer read. Perhaps @htuch or @mattklein123?

htuch · 2021-06-15T03:22:30Z

@paul-r-gall just to make sure we're doing due diligence here. In the near future, API_RECOVER_ORIGINAL will go away and I think we can filter prior to redaction. If we do this, what is the overhead of using a field mask matcher?

I.e. what is the cost of doing field mask match at the exact same points at which the ConfigDumpFilter match is applied?

paul-r-gall · 2021-06-15T16:04:15Z

@htuch I still definitely think that filtering needs to happen within the callback. Even if API_RECOVER_ORIGINAL goes away, there's still the overhead of Any::PackFrom, which accounts for about a third of the CPU time during the callback.

By field mask match, I'm assuming you mean something like path.to.first.field=regex1, path.to.second.field=regex2, right? The only issue I have there is that FieldMasks don't by default work with Any, and so to do metadata matching, we'd have to do something like the trimming fxn from config_dump_handler.cc.

As I mentioned above, at the moment simple name/metadata regexp matching is sufficient for Google's current (edit: but perhaps not future) usecase, so I'll defer to the maintainers as to whether having this capability be extendable is worth the extra review, effort, and maintenance.

mattklein123 · 2021-06-15T17:57:49Z

My personal vote is to not add an extension point right now. I would just build the matching functionality in, and if we need to extend it later I think we can add an extension point and convert the existing default implementation into a default extension.

Apart from that, I wonder if there is a way we can avoid every config dump handler being modified to do this matching which seems fragile. That seems really unfortunate. Is the issue that at the admin layer we have an Any and at that point we lose the ability to do reflection and field matching (or we would have to unpack from the Any again and waste a lot of time)? If that is the case I wonder if there is some way we could refactor the config dump handlers so that before packing the Any the admin code gets a callback to do filtering? This is not fully formed but something like: Admin passes to the handler some type of maybePackAny lambda, and the config dump handler somehow uses this to do the packing (and also the filtering). I haven't thought about it a ton but it seems like something like this would be better than missing filtering in new config dump handlers.

htuch · 2021-06-16T01:10:10Z

@paul-r-gall maybe just do simple name matching for now then. I think anything that needs to reach inside metadata is going to end up involving some amount of Any interaction which is not going to be super fast. We could later add specific metadata matchers as well, which probably solves most use cases (between metadata matching and top-level field matching).

mattklein123

Thanks this makes sense to me. Just 2 small comments from a quick pass.

/wait

source/server/admin/config_dump_handler.cc

mattklein123 · 2021-06-28T15:45:02Z

docs/root/operations/admin.rst

+
+.. http:get:: /config_dump?name_regex={}
+
+  Dump only the currently loaded configurations whose names match the specified regex. Can be used with


doc nit: Can you ref link some example name fields for the user so it's more clear what name we are referring to. Separately, this means that effectively we need a name field in all xDS protos to make this consistent, right? (Just trying to make sure I understand.)

So currently all xDS protos have a name-like field; most of those name-like fields are actually called name, except for endpoints, which have cluster_name. Only the protos whose name-like field matches the given regexp will be added to the dump. E.G. putting in name_regex=prefix.* will return routes such that route.name() matches, and endpoints such that endpoint.cluster_name() matches.

Yup that makes sense. I would just ref link some fields or just list them in a bullet list to be more clear to the user. Right now I think it might not be clear to the user what this will do.

Done, for some reason clang-format is complaining about this file and tools/code_format/check_format.py fix isn't making any changes -- do you know what that's about?

Hmm, not sure how it would be complaining about the RST file. What is the error? cc @phlax

Wound up forcing with --no-verify, but the error was

~/git/envoy$ git push origin FilterConfigDump Running pre-push check; to skip this step use 'push --no-verify' Checking format for docs/root/operations/admin.rst - ERROR: From ./docs/root/operations/admin.rst ERROR: Unable to find Envoy namespace or NOLINT(namespace-envoy) for file: ./docs/root/operations/admin.rst ERROR: clang-format check failed for file: ./docs/root/operations/admin.rst ERROR: check format failed. run 'tools/code_format/check_format.py fix' error: failed to push some refs to 'github.com:paul-r-gall/envoy.git

Hmm this sounds like some type of check format regression. cc @phlax to help out.

Signed-off-by: Paul Gallagher <pgal@google.com>

mattklein123

Thanks!

mattklein123 · 2021-06-28T22:55:40Z

source/server/admin/config_dump_handler.cc

  *matcher.mutable_google_re2() = envoy::type::matcher::v3::RegexMatcher::GoogleRE2();
  matcher.set_regex(*name_regex);
+  TRY_ASSERT_MAIN_THREAD
  return Regex::Utility::parseRegex(matcher);


Ultimately it would be nice to have a version of this that doesn't throw given this very short call stack, but not a blocker for this PR.

paul-r-gall · 2021-06-29T14:25:27Z

Coverage run indicates that source/server coverage is lower than needed; 94.3 vs 94.4, but source/server indicates that coverage is 94.4, and all the files I changed have the correct coverage. @mattklein123 what action should I take here besides /retest ?

mattklein123 · 2021-06-29T15:46:08Z

Coverage run indicates that source/server coverage is lower than needed; 94.3 vs 94.4, but source/server indicates that coverage is 94.4, and all the files I changed have the correct coverage. @mattklein123 what action should I take here besides /retest ?

cc @alyssawilk. Coverage can be flaky so you could try a retest? I'm not sure.

paul-r-gall · 2021-06-29T17:05:39Z

/retest

repokitteh-read-only · 2021-06-29T17:05:43Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

paul-r-gall · 2021-06-29T18:50:04Z

/retest due to timeout

repokitteh-read-only · 2021-06-29T18:50:07Z

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

mattklein123 · 2021-06-29T19:12:04Z

Sorry can you merge main which should fix coverage?

/wait

Signed-off-by: Paul Gallagher <pgal@google.com>

mattklein123 · 2021-06-30T14:45:25Z

Sorry one more merge should fix this.

/wait

Signed-off-by: Paul Gallagher <pgal@google.com>

paul-r-gall · 2021-07-01T17:53:26Z

/retest

repokitteh-read-only · 2021-07-01T17:53:31Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

paul-r-gall · 2021-07-02T13:12:04Z

/retest

macos presubmit continues to time out

repokitteh-read-only · 2021-07-02T13:12:08Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

* main: listener: match rebalancer to listener IP family type (envoyproxy#16914) jwt_authn: implementation of www-authenticate header (envoyproxy#16216) listener: reset the file event in framework instead of listener filter doing itself (envoyproxy#17227) Small typo fix (envoyproxy#17247) Doc: Clarify request/response attributes are http-only (envoyproxy#17204) bazel/README.md: add aspell comment (envoyproxy#17072) docs: Fix broken URL links in HTTP upgrades doc (envoyproxy#17225) remove the wrong comment on test (envoyproxy#17233) upstream: allow clusters to skip waiting on warmup for initialization (envoyproxy#17179) JwtAuthn: support completing padding on forward jwt payload header (envoyproxy#16752) remove support for v2 UNSUPPORTED_REST_LEGACY (envoyproxy#16968) metrics service: fix wrong argument arrange on MetricsServiceSink (envoyproxy#17127) deps: update cel-cpp to 0.6.1 (envoyproxy#16293) Add ability to filter ConfigDump. (envoyproxy#16774) examples: fix Wasm example. (envoyproxy#17218) upstream: update host's socket factory when metadata is updated. (envoyproxy#16708) Signed-off-by: Garrett Bourg <bourg@squareup.com>

Signed-off-by: Paul Gallagher <pgal@google.com> Signed-off-by: chris.xin <xinchuantao@qq.com>

Signed-off-by: Paul Gallagher <pgal@google.com>

Add ability to filter ConfigDump.

818ec8d

Signed-off-by: Paul Gallagher <pgal@google.com>

paul-r-gall changed the title ~~Add ability to filter ConfigDump.~~ DRAFT: Add ability to filter ConfigDump. Jun 2, 2021

repokitteh-read-only bot added the waiting:any label Jun 2, 2021

repokitteh-read-only bot removed the waiting:any label Jun 2, 2021

junr03 assigned zuercher Jun 2, 2021

paul-r-gall mentioned this pull request Jun 3, 2021

Add ability to filter ConfigDump by arbitrary predicates #16680

Closed

zuercher reviewed Jun 3, 2021

View reviewed changes

paul-r-gall added 4 commits June 8, 2021 06:44

Merge remote-tracking branch 'upstream/main' into FilterConfigDump

c9712c3

Signed-off-by: Paul Gallagher <pgal@google.com>

Revert "Add ability to filter ConfigDump."

56fd0f9

This reverts commit 818ec8d. Signed-off-by: Paul Gallagher <pgal@google.com>

Add ConfigDumpFilter setup with default filter extension.

34f1993

Signed-off-by: Paul Gallagher <pgal@google.com>

Merge remote-tracking branch 'upstream/main' into FilterConfigDump

74606de

Signed-off-by: Paul Gallagher <pgal@google.com>

repokitteh-read-only bot added the api label Jun 9, 2021

repokitteh-read-only bot assigned adisuissa Jun 9, 2021

Fix PURE virtual fxns

56627a4

Signed-off-by: Paul Gallagher <pgal@google.com>

paul-r-gall dismissed zuercher’s stale review via 64a720b June 28, 2021 15:26

mattklein123 requested changes Jun 28, 2021

View reviewed changes

repokitteh-read-only bot added the waiting label Jun 28, 2021

mattklein123 self-assigned this Jun 28, 2021

docs and exception fix

2b7b069

Signed-off-by: Paul Gallagher <pgal@google.com>

repokitteh-read-only bot removed the waiting label Jun 28, 2021

mattklein123 approved these changes Jun 28, 2021

View reviewed changes

repokitteh-read-only bot added the waiting label Jun 29, 2021

Merge remote-tracking branch 'upstream/main' into FilterConfigDump

3c46827

Signed-off-by: Paul Gallagher <pgal@google.com>

repokitteh-read-only bot removed the waiting label Jun 29, 2021

repokitteh-read-only bot added the waiting label Jun 30, 2021

Merge remote-tracking branch 'upstream/main' into FilterConfigDump

1e2c4fc

Signed-off-by: Paul Gallagher <pgal@google.com>

repokitteh-read-only bot removed the waiting label Jul 1, 2021

mattklein123 merged commit 38c7b1f into envoyproxy:main Jul 2, 2021

paul-r-gall deleted the FilterConfigDump branch July 7, 2021 11:50

chrisxrepo pushed a commit to chrisxrepo/envoy that referenced this pull request Jul 8, 2021

Add ability to filter ConfigDump. (envoyproxy#16774)

043bd7f

Signed-off-by: Paul Gallagher <pgal@google.com> Signed-off-by: chris.xin <xinchuantao@qq.com>

leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021

Add ability to filter ConfigDump. (envoyproxy#16774)

e9017c1

Signed-off-by: Paul Gallagher <pgal@google.com>


		.. http:get:: /config_dump?name_regex={}

		Dump only the currently loaded configurations whose names match the specified regex. Can be used with

Conversation

paul-r-gall commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junr03 commented Jun 2, 2021

Uh oh!

paul-r-gall commented Jun 2, 2021

Uh oh!

zuercher left a comment

Choose a reason for hiding this comment

Uh oh!

paul-r-gall commented Jun 3, 2021

Uh oh!

zuercher commented Jun 3, 2021

Uh oh!

paul-r-gall commented Jun 3, 2021

Uh oh!

paul-r-gall commented Jun 4, 2021

Uh oh!

zuercher commented Jun 4, 2021

Uh oh!

paul-r-gall commented Jun 4, 2021

Uh oh!

zuercher commented Jun 7, 2021

Uh oh!

repokitteh-read-only bot commented Jun 9, 2021

Uh oh!

paul-r-gall commented Jun 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuercher commented Jun 15, 2021

Uh oh!

htuch commented Jun 15, 2021

Uh oh!

paul-r-gall commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattklein123 commented Jun 15, 2021

Uh oh!

htuch commented Jun 16, 2021

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-r-gall commented Jun 29, 2021

Uh oh!

mattklein123 commented Jun 29, 2021

Uh oh!

paul-r-gall commented Jun 29, 2021

Uh oh!

repokitteh-read-only bot commented Jun 29, 2021

Uh oh!

paul-r-gall commented Jun 29, 2021

Uh oh!

repokitteh-read-only bot commented Jun 29, 2021

Uh oh!

mattklein123 commented Jun 29, 2021

Uh oh!

mattklein123 commented Jun 30, 2021

Uh oh!

paul-r-gall commented Jun 2, 2021 •

edited

Loading

paul-r-gall commented Jun 9, 2021 •

edited

Loading

paul-r-gall commented Jun 15, 2021 •

edited

Loading