Skip to content

Proposal: support for dynamic rate limits #11595

@Pchelolo

Description

@Pchelolo

Rationale

Wikimedia Foundation is going to adopt envoy as an API Gateway. One of the features we are interested in is rate limiting. In our design, it is a fundamental property of the API that requests are rate limited differently based on who is making the request. Specifically, they stipulate that we will rate limit on a client ID basis, where clients are grouped into classes that share a common limit. Thus, within our system the rate limits are a property of the collection of users.

Within the current architecture of the ratelimiter service in envoy, it is possible to duplicate the client classes and the limit values into the static config of the rate limiter service, however, duplication of the config creates all kinds of problems, especially if the client classes and their limits are dynamic.
Instead, we want to have a single source of truth for client properties, including the client classes and rate limits assigned to them, and the natural place to hold this state is user management/authentication system. We can include the limits/classes into the user authentication token via JWT and pass this state into envoy. However, there’s currently no way to pass the state further into the rate limiting service.

We have come up with a proposal solution outlined below, but we’re very open to alternatives and seeking guidance on the best way to solve our problem. We intend to contribute all the code changes.

Proposal

Expand the RateLimitDescriptor message to optionally include a limit override. Thus, the RateLimiterRequest would be expanded to look like this:

# (pseudo-yaml, will be gRPC message, thus it will be backwards-compatible)
RateLimitRequest:
  domain: gateway
  descriptors:
    - value: ( client_id: 1234 )
       limits: # optional
         value: 10
         unit: second

Within the rate limiting service the semantics of the override would be the following: if the limit override is provided, it’s treated as if a config matching the descriptor existed in the limiter static configuration. If the override is not provided, the static config is used. Thus, the changes are entirely backwards-compatible and optional.

The rate limit filter in envoy doesn’t need to be changed, essentially it still would still get the list of descriptors and send them into the rate limiter service.

The rate limits actions will have to be changed to support specifying the limit overrides. Strawman config could look somewhat like this:

rate_limits:
  - stage: 0
    actions:
      - {request_headers: {header_name: "x-client-id", descriptor_key: "client-id"}}
    limit: { static: { value: 10, unit: seconds } } # optional

The limit actions could have various kinds, roughly based on the descriptor action types - we’d be able to provide a static override, take the value/units from headers, or from dynamic metadata.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions