Skip to content

[Question] - (route) metadata vs dynamic metadata #12556

@jeremybaumont

Description

@jeremybaumont

Context

We were reviewing the change log of v1.15.0 and we noted some changes that could improve our rate limiting setup. We decided to spike one the new feature, and make sure we understand it correctly:

  • ratelimit: added support for use of dynamic metadata dynamic_metadata as a ratelimit action.

When we first read that line and the documentation linked, we though that it would make our life much more easier. Let me explain succinctly what we mean by that.

We are mapping a path of a request to an operation identifier for reasons (think for example if your request calls with PUT "/admin/user" path, we want to associate it to the operation identifier "create-user"), and then we use this operation identifier in Envoy’s rate limit filter.

To configure that we are doing quite a “pasodoble”. We add temporally, a header with the operation identifier for the route that matches the path. This header will be matched by the rate limit action (header matcher) configuration, and passed as a descriptor key to the gRPC call of the rate limit service. Now to add the header that contains the operation identifier. We use HTTP Lua filters to add and remove an HTTP header on the request. So when we saw that change, we though we could simplify our configuration. Joy was in our heart.

We though we could set metadata at the route level with filter_metadata, and use this metadata to fill the operation descriptor key with the dynamic_metadata rate limit action like the following:

                  virtual_hosts:
                    - name: backend
                      domains:
                        - "*"
                      rate_limits:
                        - actions:
                            dynamic_metadata:
                              descriptor_key: "operation"
                              metadata_key:
                                key: "envoy.filters.http.ratelimit"
                                path:
                                  - key: "descriptor"
                                  - key: "operation"
                        - actions:
                            request_headers:
                              header_name: "x-auth-client"
                              descriptor_key: "client"
                      routes:
                        - name: example-operation-1
                          match:
                            prefix: "/anything/shouldRateLimit"
                          metadata:
                            filter_metadata:
                              envoy.filters.http.ratelimit:
                                descriptor:
                                  operation: "operation-1"
                          route:
                            cluster: "httpbin"
                            include_vh_rate_limits: true
                        - name: default-route
                          match:
                            prefix: "/"
                          route:
                            cluster: httpbin
                            include_vh_rate_limits: false

We tested this rate limit configuration approach on a vanilla Envoy setup, eliminating anything related to our internal setup. We use some Envoy use case examples repository to “learn Envoy's core feature set and how to configure it by hand”. You can find the envoy configuration corresponding to the extract above, and a small docker-compose repository to reproduce what we mean.

At our surprised the description key "operation" was not filled when Envoy rate limit filter was calling the rate limit service mock.

In a previous troubleshooting investigation, we learn to successfully setup remote gdb debugging of an Envoy binary with debug symbol in our local docker-compose world. We decided to have a look at what was happening in Envoy code base in a live debugging session.

In source/common/router/router_ratelimit.cc, we could displayed the values of the configuration of the rate limit action in metadata_key_ and the expected metadata of filter_metadata in dynamic_metadata. We observed that no metadata was present in dynamic_metadata (it was empty). Something was wrong about how we configured the filter_metadata.

Our next step in our troubleshooting was to have a look at the integration tests of Envoy code base, and how they configure dynamic metadata. We find this is often a good place to look since it uses some YAML configuration samples. We could not find anything obvious that we were doing wrong, literally copying and pasting the YAML used there. We think that those integration tests tricked us, they are using filter_metadata, but we think they should not, more on this later.

Our last procrastination, before asking for help publicly, was to review Envoy slack channel history and prior GitHub issues. That’s the minimum you can do. You don’t want to waste people’s time. That’s when we found a crucial clue to this story. We found a discussion in Envoy slack that was referring to this GitHub issue. Some user had problem with metadata set by filter_metadata in a route configuration that was not to be found in the access log filter. Something almost identical to our scenario but with the rate limit filter instead of access log filter. Especially the comments on the GitHub issue triggers the idea that route metadata is potentially different than dynamic metadata.

We think that the route metadata set by filter_metadata are different "data" than the dynamic metadata. They are different abstraction. We understand that route metadata is used, for example, by Lua HTTP filters. We knew that for sure because we already using it in our approach that add and remove operation header with Lua filters. If we are correct, the documentation is not really clear with the distinction (and we are not alone to be confused). We found the best explanation resided in the HTTP Lua filters section, they are the rare place where you can set both route metadata and dynamic metadata, so the separation is flagrant.

But the change adds support for dynamic metadata as a rate limit action, not route metadata. Fair enough, how can we set dynamic metadata for a route, we were asking our-self. From the comments of Dio in the GitHub issue, we understand there are several ways: we can implement an Envoy filter that will match a route and use the StreamInfo API, use the header to metadata filter (but that would mean we would need again header, that is not a big win for us, remember we are trying to get ride of that), or use directly HTTP Lua filters (same, not a big win).

At this point, we were deeply confused about this dynamic metadata. Even more confused by the change, why would somebody add support of dynamic metadata as rate limit action, because in our mind, most of the action in Envoy are driven by a HTTP route. That did not make sense. We were biased obviously. We decided to take a step back. Have a little walk, and brew some coffee. We asked our-self, what is really dynamic metadata? For which purposes people use dynamic metadata?

We looked at the PR that bring the change, and found that it originated from this proposal. When we read the description, we understood the intent of the change. The users wanted to pass information directly from the dynamic metadata to the rate limit action. They wanted to avoid using a Lua filter that will read the dynamic metadata and append it to a forged header that will be matched by a rate limit action. The key element there is they have already the information in the dynamic metadata. No mention of route here to set some dynamic metadata.

Dynamic metadata is actually a great idea (you folks rocks). We found this advanced section in the documentation that completed our understanding. We understand that filters can emit dynamic metadata that are consumed by other filters. Take for example, the network Postgres filter. It will set the table names and the SQL operations as dynamic metadata, seen in the SQL statement. That is fantastic, now you can have for example a rate limit action that will found the table name in the dynamic metadata and you can have a rate limit based on this descriptor key.

Question

Sorry for the long context setup, but we think it helps to understand where we come from. Our question is more or less to confirm or infirm our understanding detailed in the previous paragraphs of what is dynamic metadata? And the difference between (route) metadata and dynamic metadata?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestions that are neither investigations, bugs, nor enhancements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions