Skip to content

Add ability to filter ConfigDump.#16774

Merged
mattklein123 merged 23 commits intoenvoyproxy:mainfrom
paul-r-gall:FilterConfigDump
Jul 2, 2021
Merged

Add ability to filter ConfigDump.#16774
mattklein123 merged 23 commits intoenvoyproxy:mainfrom
paul-r-gall:FilterConfigDump

Conversation

@paul-r-gall
Copy link
Copy Markdown
Contributor

@paul-r-gall paul-r-gall commented Jun 2, 2021

Signed-off-by: Paul Gallagher pgal@google.com

Commit Message: admin: Add filtering and matching code to ConfigDump.
Additional Description: Config Dump callbacks now expect to receive a StringMatcher parameter, which they should apply to names before adding configs to the dump.
Risk Level: Low
Testing: unit testing, integration testing for new matcher usage.
Docs Changes: Documentation updated with instructions and example usage for name_regex query parameter.
Release Notes:
Platform Specific Features:

Signed-off-by: Paul Gallagher <pgal@google.com>
@paul-r-gall paul-r-gall changed the title Add ability to filter ConfigDump. DRAFT: Add ability to filter ConfigDump. Jun 2, 2021
@junr03
Copy link
Copy Markdown
Member

junr03 commented Jun 2, 2021

Hi @paul-r-gall I noticed this PR is in Draft. Would you like me to assign a maintainer to start the code review process, or wait until you mark ready for review?

/wait-any

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

@junr03 Yes, that would be great to get a first pass from someone who can LGTM the approach, at which point I can finish up the rest of the testing.

Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by the approach. It seems to me that the implementation of each ConfigMatchDumper is closely tied to the code that registers the ConfigTracker callback. Taking ListenerManagerImpl as an example, what's the advantage of using the factory-style lookup vs just adding an isMatch method (in an unnamed namespace) that performs the necessary checks on the matching params and protobuf object?

If we are to stick with the factory-style lookup, I don't think we need to separate the factory and the matcher itself -- they can be combined into just the matcher type. UntypedFactory isn't necessary either -- these aren't extensions with their own configuration types. Finally, I think we'd want to use a bit of template code to allow implementors to just write a single match method that takes the concrete protobuf type and matching parameters. That is name() and the generic match can be templatized in a base class, leaving the implementor to write (again using Listener as the example) bool match(const envoy::config::listener::v3::Listener&, const Matcher::ConfigDump::MatchingParameters&). But unless I've misunderstood something, I think just implementing the filtering in the callback implementations will be easier to understand.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! I'm new to Envoy, so really appreciate the comments.

The advantage of factory-style lookup is that it allows end users to customize the matching however they like. For example, using Listener again, if there's some_specific_parameter buried deep inside the externally populated metadata field that the user wants to filter by, there's no way for a standard isMatch method to know about where this parameter lives, or how to match against it.

With regards to not separating the Factory and the Matcher, I agree, probably unnecessary, and a consequence of me not yet fully getting Envoy style yet.

In terms of templating, I would be fine with that approach rather then registering all separate matchers. Just to make sure that I understand, integrations would make template specializations for whatever types needed custom matching (Listener, RouteConfiguration, etc).

@zuercher
Copy link
Copy Markdown
Member

zuercher commented Jun 3, 2021

The advantage of factory-style lookup is that it allows end users to customize the matching however they like. For example, using Listener again, if there's some_specific_parameter buried deep inside the externally populated metadata field that the user wants to filter by, there's no way for a standard isMatch method to know about where this parameter lives, or how to match against it.

I don't propose a single isMatch that would work for all the object types, I'm really just questioning whether a factory lookup is necessary vs. just writing an isMatch method for each type that can be dumped.

For the metadata case, since the matching is always controlled by the DumpMatcher object registered for the protobuf type (e.g. Listener or Cluster), that DumpMatcher implementation would need to extract the Metadata object from Listener (or Cluster) and then compare that with the match params. You could implement that by making another call to look up a DumpMatcher for Metadata, but you could also just write a utility method that matched Metadata and call it directly.

I took the DumpMatcher implementation in the listener test as an example of what the real implementation would look like, but maybe that's too trivial an example?

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

So, the scenario that I'm thinking of is closer to the following more complicated metadata case:

An Envoy integration has an external proto some.external.Proto, that is one of the fields inside of the metadata field of Cluster. So a cluster proto will look like:

name: cluster_name
// non-metadata values
metadata {
  filter_metadata {
    key: "some.external.Proto"
    value {
      fields {
        key: "field_of_interest"
        value { string_value: "important_value" }
      }
      // other fields may be here
    }
    // other filter_metadata may be here
  }
}

Ideally, the external user would be able to send something like /config_dump?match_some_external_proto_field_of_interest=important_value which only returns clusters which have metadata that looks like the above proto. So they'd write their Cluster specialization of match as follows:

bool match(const envoy::config::cluster::v3::Cluster& cluster,
           const MatchingParameters& params) override {
  bool is_match = true;
  if (params.contains("some_external_proto_field_of_interest")) {
    // parse cluster.metadata["some.external.Proto"] into an instance of some.external.Proto
    is_match &= parsed_proto.field_of_interest() == params["some_external_proto_field_of_interest"]
  }
  // other matching logic w/ other params
  return is_match;
}

Whereas, if all this were to be done on the Envoy side, there'd need to be an Envoy function like

bool isMatch(const envoy::config::core::v3::Metadata& metadata,
             const MatchingParameters& params) {
  // somehow parse/match params and metadata in such a way that's flexible enough to cover every
  // scenario that users can come up with.
}

Admittedly, this is an extreme case of complexity, but I think I would prefer to provide the end user with as much matching flexibility as possible.

As a non metadata example, similar flexibility would be needed to filter fragment information from a ScopedRouteConfiguration.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

Another thought about why I used name-based factory lookup instead of templates -- the name-based factory lookup allows tests to inject resource-specific behavior, whereas with a template approach all matching behavior for a resource type is fully decided at compile time.

@zuercher
Copy link
Copy Markdown
Member

zuercher commented Jun 4, 2021

So they'd write their Cluster specialization of match as follows

It seems like you want to be able to extend Envoy with custom implementations that might override built-in matchers. In that case, if the registered name is just the type name (e.g. "envoy.config.cluster.v3.Cluster") there's an initialization order problem. Envoy's registry system makes use of static initializers. This works because we expect each extension to register a unique name (which is [enforced by the code[(https://github.com/envoyproxy/envoy/blob/main/envoy/registry/registry.h#L216)). We could relax that check, but C++ offers no guarantees about whether the default implementation for Cluster would be registered before or after a custom one. The ordering could easily change from build-to-build, depending on the compiler. You could have custom filtering extensions and request one using a URL parameter, (e.g. config_dump?filter_ext=envoy.admin_filters.my_custom_filter&filter=... but that doesn't feel like a very natural extension point for Envoy.

Honestly, I think you could make use of descriptor and message to implement simple filtering in a generic way. I agree that coming up with a way to encode arbitrarily complex queries would be difficult, but I'm also not convinced that should be the goal. I expect most use cases for this are "name=x" or "metadata.filter_metadata.com.namespace.custom.key=value" and combinations via logical and. I think that's that's mostly just a matter of deciding the right way to delimit and escape proto field names, map keys, and array indexes and writing the filtering code. All of which could live in source/server/admin. Since the output of config dump is JSON-encoded protobuf objects, I don't think it's too much to expect people to provide their own scripting for complex cases.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

I was not actually planning on creating any default matchers, so there wouldn't be a name conflict.

CPU profiling indicates that about 35% of the CPU time in handlerConfigDump is spent in API_RECOVER_ORIGINAL and Any::PackFrom, so moving filtering to source/server/admin won't fix that issue (as all protos will still have API_RECOVER_ORIGINAL and Any::PackFrom called on them). Additionally, another ~40% of time is spent in redact, and an additional 10% is spent in getJsonStringFromMessage, so expecting integrations to do their own filtering for complex cases after the JSON is returned also doesn't solve the CPU time issue (which, along with memory usage, is the main reason for this PR).

Based on the complexity of the code in trimResourceMessage to apply a FieldMask to config dump, and the various TODOs in there, I'm skeptical that it'd be easy to do simple filtering in a generic way.

All that said, if you think it would be better to just do really simple filtering/matching by name (maybe w/ a regex or something), and leave the more complicated material for a future PR to allow more discussion and consideration, I would be fine with that (and it would satisfy the current Google-internal requirements).

@zuercher
Copy link
Copy Markdown
Member

zuercher commented Jun 7, 2021

Thanks for taking the time to provide those measurements. I can see why you want to push the filtering down as far as possible.

Historically, we've not provided extension points that aren't used at all from within Envoy. That said, I think we could have a relatively simple default implementation (maybe something that only filters on name and simple metadata) and a way to configure Admin to use a different implementation, provided as an extension. I think this provides similar performance to what's in the PR now: the filtering occurs as each object is retrieved and therefore avoids the expensive operations for objects that are filtered out.

The differences are:

  1. Instead of looking the extension up by the type of object being dumped, we'd add a config_dump_filter field to Admin. (Feel free to pick a better name.) The type would be a message with the usual name/typed_config fields that we use for specifying extensions. If unset, the default would be used, and your filter can be provided as an extension.
  2. Rather than passing MatchParameters into the callback, you'd lookup the extension factory, construct a filter implementation from the factory by passing the query param(s), and then pass the filter into the callback.
  3. It's probably easiest if the filter implementations are expected to handle all the various object types (and ignore unknown types), rather than having a separate filter implementation for each type. This leaves open the possibility of someone writing a completely generic filter, but doesn't require it. You can just check the message type, cast and examine fields directly.

I think that addresses my concerns and leaves you free to implement private filtering code that does what you need without worrying about other operators might want.

Signed-off-by: Paul Gallagher <pgal@google.com>
This reverts commit 818ec8d.

Signed-off-by: Paul Gallagher <pgal@google.com>
Signed-off-by: Paul Gallagher <pgal@google.com>
Signed-off-by: Paul Gallagher <pgal@google.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #16774 was synchronize by paul-r-gall.

see: more, trace.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

paul-r-gall commented Jun 9, 2021

Added new commit implementing @zuercher's suggestions. @htuch please let me know what you think.

Signed-off-by: Paul Gallagher <pgal@google.com>
@zuercher
Copy link
Copy Markdown
Member

Thanks! Yes, that looks like the correct approach to me. In order to make sure there's consensus, I would like still like to have another maintainer chime in and then I'll do a closer read. Perhaps @htuch or @mattklein123?

@htuch
Copy link
Copy Markdown
Member

htuch commented Jun 15, 2021

@paul-r-gall just to make sure we're doing due diligence here. In the near future, API_RECOVER_ORIGINAL will go away and I think we can filter prior to redaction. If we do this, what is the overhead of using a field mask matcher?

I.e. what is the cost of doing field mask match at the exact same points at which the ConfigDumpFilter match is applied?

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

paul-r-gall commented Jun 15, 2021

@htuch I still definitely think that filtering needs to happen within the callback. Even if API_RECOVER_ORIGINAL goes away, there's still the overhead of Any::PackFrom, which accounts for about a third of the CPU time during the callback.

By field mask match, I'm assuming you mean something like path.to.first.field=regex1, path.to.second.field=regex2, right? The only issue I have there is that FieldMasks don't by default work with Any, and so to do metadata matching, we'd have to do something like the trimming fxn from config_dump_handler.cc.

As I mentioned above, at the moment simple name/metadata regexp matching is sufficient for Google's current (edit: but perhaps not future) usecase, so I'll defer to the maintainers as to whether having this capability be extendable is worth the extra review, effort, and maintenance.

@mattklein123
Copy link
Copy Markdown
Member

My personal vote is to not add an extension point right now. I would just build the matching functionality in, and if we need to extend it later I think we can add an extension point and convert the existing default implementation into a default extension.

Apart from that, I wonder if there is a way we can avoid every config dump handler being modified to do this matching which seems fragile. That seems really unfortunate. Is the issue that at the admin layer we have an Any and at that point we lose the ability to do reflection and field matching (or we would have to unpack from the Any again and waste a lot of time)? If that is the case I wonder if there is some way we could refactor the config dump handlers so that before packing the Any the admin code gets a callback to do filtering? This is not fully formed but something like: Admin passes to the handler some type of maybePackAny lambda, and the config dump handler somehow uses this to do the packing (and also the filtering). I haven't thought about it a ton but it seems like something like this would be better than missing filtering in new config dump handlers.

@htuch
Copy link
Copy Markdown
Member

htuch commented Jun 16, 2021

@paul-r-gall maybe just do simple name matching for now then. I think anything that needs to reach inside metadata is going to end up involving some amount of Any interaction which is not going to be super fast. We could later add specific metadata matchers as well, which probably solves most use cases (between metadata matching and top-level field matching).

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this makes sense to me. Just 2 small comments from a quick pass.

/wait


.. http:get:: /config_dump?name_regex={}

Dump only the currently loaded configurations whose names match the specified regex. Can be used with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc nit: Can you ref link some example name fields for the user so it's more clear what name we are referring to. Separately, this means that effectively we need a name field in all xDS protos to make this consistent, right? (Just trying to make sure I understand.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently all xDS protos have a name-like field; most of those name-like fields are actually called name, except for endpoints, which have cluster_name. Only the protos whose name-like field matches the given regexp will be added to the dump. E.G. putting in name_regex=prefix.* will return routes such that route.name() matches, and endpoints such that endpoint.cluster_name() matches.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that makes sense. I would just ref link some fields or just list them in a bullet list to be more clear to the user. Right now I think it might not be clear to the user what this will do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, for some reason clang-format is complaining about this file and tools/code_format/check_format.py fix isn't making any changes -- do you know what that's about?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure how it would be complaining about the RST file. What is the error? cc @phlax

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wound up forcing with --no-verify, but the error was

~/git/envoy$ git push origin FilterConfigDump
Running pre-push check; to skip this step use 'push --no-verify'
  Checking format for docs/root/operations/admin.rst - ERROR: From ./docs/root/operations/admin.rst
ERROR: Unable to find Envoy namespace or NOLINT(namespace-envoy) for file: ./docs/root/operations/admin.rst
ERROR: clang-format check failed for file: ./docs/root/operations/admin.rst
ERROR: check format failed. run 'tools/code_format/check_format.py fix'
error: failed to push some refs to 'github.com:paul-r-gall/envoy.git

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this sounds like some type of check format regression. cc @phlax to help out.

Signed-off-by: Paul Gallagher <pgal@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

*matcher.mutable_google_re2() = envoy::type::matcher::v3::RegexMatcher::GoogleRE2();
matcher.set_regex(*name_regex);
TRY_ASSERT_MAIN_THREAD
return Regex::Utility::parseRegex(matcher);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately it would be nice to have a version of this that doesn't throw given this very short call stack, but not a blocker for this PR.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

Coverage run indicates that source/server coverage is lower than needed; 94.3 vs 94.4, but source/server indicates that coverage is 94.4, and all the files I changed have the correct coverage. @mattklein123 what action should I take here besides /retest ?

@mattklein123
Copy link
Copy Markdown
Member

Coverage run indicates that source/server coverage is lower than needed; 94.3 vs 94.4, but source/server indicates that coverage is 94.4, and all the files I changed have the correct coverage. @mattklein123 what action should I take here besides /retest ?

cc @alyssawilk. Coverage can be flaky so you could try a retest? I'm not sure.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

/retest due to timeout

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

@mattklein123
Copy link
Copy Markdown
Member

Sorry can you merge main which should fix coverage?

/wait

Signed-off-by: Paul Gallagher <pgal@google.com>
@mattklein123
Copy link
Copy Markdown
Member

Sorry one more merge should fix this.

/wait

Signed-off-by: Paul Gallagher <pgal@google.com>
@paul-r-gall
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

@paul-r-gall
Copy link
Copy Markdown
Contributor Author

/retest

macos presubmit continues to time out

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16774 (comment) was created by @paul-r-gall.

see: more, trace.

@mattklein123 mattklein123 merged commit 38c7b1f into envoyproxy:main Jul 2, 2021
@paul-r-gall paul-r-gall deleted the FilterConfigDump branch July 7, 2021 11:50
baojr added a commit to baojr/envoy that referenced this pull request Jul 7, 2021
* main:
  listener: match rebalancer to listener IP family type (envoyproxy#16914)
  jwt_authn: implementation of www-authenticate header (envoyproxy#16216)
  listener: reset the file event in framework instead of listener filter doing itself (envoyproxy#17227)
  Small typo fix (envoyproxy#17247)
  Doc: Clarify request/response attributes are http-only (envoyproxy#17204)
  bazel/README.md: add aspell comment (envoyproxy#17072)
  docs: Fix broken URL links in HTTP upgrades doc (envoyproxy#17225)
  remove the wrong comment on test (envoyproxy#17233)
  upstream: allow clusters to skip waiting on warmup for initialization (envoyproxy#17179)
  JwtAuthn: support completing padding on forward jwt payload header (envoyproxy#16752)
  remove support for v2 UNSUPPORTED_REST_LEGACY (envoyproxy#16968)
  metrics service: fix wrong argument arrange on MetricsServiceSink (envoyproxy#17127)
  deps: update cel-cpp to 0.6.1 (envoyproxy#16293)
  Add ability to filter ConfigDump. (envoyproxy#16774)
  examples: fix Wasm example. (envoyproxy#17218)
  upstream: update host's socket factory when metadata is updated. (envoyproxy#16708)

Signed-off-by: Garrett Bourg <bourg@squareup.com>
chrisxrepo pushed a commit to chrisxrepo/envoy that referenced this pull request Jul 8, 2021
Signed-off-by: Paul Gallagher <pgal@google.com>
Signed-off-by: chris.xin <xinchuantao@qq.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
Signed-off-by: Paul Gallagher <pgal@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants