proposal: BackendTrafficPolicy

## Relates to
https://github.com/envoyproxy/gateway/issues/1492
https://github.com/envoyproxy/gateway/issues/1821
https://github.com/envoyproxy/gateway/issues/1845

## What is this?
`BackendTrafficPolicy` is a proposal for a new policy attachment resource that can be applied to `Gateways` and `xRoute` resources. It is meant to supplement a lot of config that Gateway API currently lacks while also providing a way to specify global defaults. It is focused on configuring the behaviour of traffic between EnvoyProxy and the backend service.

## What if Gateway API implements a GEP that delivers the same functionality as some of the fields of this resource?
If Gateway improves any of their existing resources to deliver functionality that meets all of the needs of any of the below config,
then we will deprecate the field in this resource and use the Gateway API config instead. Ideally, a lot of this config can be
upstreamed into Gateway API in some form eventually since this resource only exists to solve areas where Gateway API is lacking for the needs of Envoy Gateway.

## What about traffic between EnvoyProxy and the downstream clients?
I will be opening up a sister issue proposing an `InboundTrafficPolicy` (or similar) resource here shortly that will focus on this use case.

## What happens when I attach it to a `Gateway` resource?
If you apply this resource to a `Gateway` resource, any config that applies to the "listener level" will be configured there.
A lot of the config is more "route level," and so for those fields, they may not do anything if all you have is a Gateway, but they will be used as the default configuration for all children `xRoute` resources of the `Gateway` this resource is attached to. 

## What happens when I attach it to an `xRoute` resource (`HTTPRoute`, `GRPCRoute`, etc.)
If you apply this config to an `xRoute` resource, it will apply route-level config to this specific route. 
If you have an `BackendTrafficPolicy` attached to both a `Gateway` and an `xRoute` resource, then config from the one 
attached to the `xRoute` will win in a conflict. The "merging" in this case, will be primitive. If you have any sort of 
config for a top-level field such as `circuitBreakers`, then the whole field will be replaced rather than comparing all of the subfields
and merging/patching them.

## Can you use multiple `BackendTrafficPolicy` resources at the same time?
You may use multiple `BackendTrafficPolicies` that target different resources, but you may not attach multiple `BackendTrafficPolicies`
to the same resource. For example, it is invalid to attach two `BackendTrafficPolicies` to the same `Gateway`. Merging config in this scenario would quickly become convoluted and confusing, and there are many edge-case scenarios that make supporting this undesirable.

## How will this be developed/implemented?

This issue serves as a proposal for the high-level design and fields of the API, but that does not mean that every field included below will be immediately available. My plan for the implementation is to add and implement one high-level field at a time until the whole resource is implemented. 

## API outline

```yaml
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: example-policy
spec:
  dns: # optional
    type: enum ("logical"/"strict") # optional, default=?
    respectDnsTtl: bool # optional
  protocols: # optional
    enableIPv4: bool # optional, default=true
    enableIPv6: bool # optional, default=false
    # Should we add something here for upgrades (websockets/spdy)?
  circuitBreakers: # optional
  - priority: Enum("default"/"high") # optional, default=default
    maxConnections: int # optional, default=1024
    maxPendingRequests: int # optional, default=1024
    maxParallelRequests: int # optional, default=1024
  keepAliveProbes: # optional
    idleTime: duration # optional, default=7200s
    interval: duration # optional, default=75s
    maxProbes: int # optional, default=9
  retryStrategy: # optional
    type: Enum (grpc/http) # required
    http: # required when type=http
      retryOn: Enum (5xx/gateway-error/disconnect-reset/connect-fail/retriable-4xx/refused-stream/retriable-status-codes) # required
      retriableStatusCodes: []int # required when retryOn=retriable-status-codes
    grpc: # required when type=grpc
      retryOn: Enum (cancelled/deadline-exceeded/internal/resource-exhausted/unavailable)
    numRetries: int # optional default=1
    perRetry: # optional
      timeout: duration # optional
      idleTimeout: duration # optional
      backoff: # optional
        baseInterval: duration # required
        maxInterval: duration # optional
        # we can add rate limited based backoff config here if we want to
    retryLimits: # optional
      type: Enum (budget/static)
      static: # required when type=static
        maxParallel: int # optional, default=3
      retryBudget: # required when type=budget
        activeRequestPercent: int # required, min=1, max=100, default=20
        minConcurrent: int # optional, default=3
  # It may make more sense to pull active health checking out into a new CRD
  # to be used as an extensionRef filter. You could create HTTP health checks in all your services following
  # a standard path and parameters, but for gRPC checks at least, you would need to configure them on a per-service basis 
  healthChecking: # optional
    activeChecks: # optional
      - connectionTimeout: duration # optional
        healthyCheckInterval: duration # optional, default=5s
        unhealthyCheckInterval: duration, # optional, default=10s
        unhealthyThreshold: int # optional, default=3
        healthyThreshold: int # optional, default=1
        logging: # optional
          enabled: bool # optional, default=false
          alwaysLogFailures: bool # optional, default=false
        type: Enum (grpc/http)
        grpc: # required when type=grpc
          upstreamName: string # required
          authority: string # optional
        http: # required when type=http
          expectedStatuses: # required, minItems=1
          - min: int # min=100, max=599
            max: int # min=100, max=599
          path: string # optional
          hostname: string # optional
          addRequestHeaders: # optional
            set: # optional
            - name: string # required
              value: string # required
            add: # optional
            - name: string # required
              value: string # required
          removeRequestHeaders: []string
    # There are many more options for outlier detection in Envoy, but let's start small and expand if needed
    outlierDetection: # optional
      checkInterval: duration # required, default=10s
      consecutive5XX: # optional
        threshold: int # optional, default=5
      failurePercentage: # optional
        threshold: int # optional, min=0, max=100, default=85
        minimumRequests: int # optional, default=50
  # The following timeouts field will provide functionality in-addition to the Gateway API GEP for timeouts
  # and allow configuration of Timeouts on other xRoute resources other than only HTTPRoutes
  # https://gateway-api.sigs.k8s.io/geps/gep-1742/
  timeouts: # optional
    connectTimeout: duration # optional
    idleTimeout: duration # optional
    clusterIdleTimeout: duration # optional
    maxConnectionLifetime: duration # optional
  loadBalancing: # optional
    type: Enum(roundRobin,leastRequest,ringHash,maglev) # required
    sessionAffinity: # optional, applies only to RingHash and MagLev. roundRobin and leastRequest have no config.
      type: Enum(cookie,header,sourceIp) # required (note that there is no additional config for sourceIP)
      cookie: # optional
        name: string # required
        path: string # optional
        ttl: string # optional
      header: # optional
        name: string # required
  observability: # optional
    metrics: # optional
      clusterTag: string # optional
      statsName: string # optional
      statsPrefix: string # optional
    tracing:
      samplePercentage: int # (0-100) 
      addTracingTags: # optional
      - type: Enum(literal,requestHeader) # required
        name: string # required 
        literal: # required when type=literal
          value: string # required
        requestHeader: # required when type=requestHeader
          defaultValue: string # required
  matching: # optional
    ignoreAuthorityPort: bool # optional, default=false
    stripAuthorityPort: bool # optional, default=false
    ignorePathParameters: bool # optional, default=false
  integrations:
    addLinkerdHeaders: bool # optional, default=false
  targetRef: # required
    group: string
    kind: string # must be either Gateway or xRoute object reference
    name: string
```

## Things that I considered but felt might make better `xRoute` `externalRef` filters

  - regex for headers/method/query params (something like RegexMatchFilter/RegexHeadersFilter)?
  - ip allow/deny per route
  - lua filters (I'm 50/50 on this one. It might be nice to support a global lua script rather than only allowing it per route)
  - grpc json transcoding

## Note that I explicitly chose to use the above `targets` field instead of the Gateway API standard of `targetRef`/`targetRefs`.
Using `targetRef` limits the policy to being attached to one object and it very quickly becomes very annoying to have to create a new `BackendTrafficPolicy` for each resource you want to attach it to. A very common use case is creating one `BackendTrafficPolicy` and attaching it to several `Gateways`, `HTTPRoutes`, etc. 

Looking at `targetRefs`, I feel like it also quickly becomes very annoying to have to manually specify each resource
you want the `BackendTrafficPolicy` to attach to. I can see many users wanting to create an `BackendTrafficPolicy` that attaches to all their `HTTPRoutes` but not `GRPCRoutes` (or some similar variation that enables efficient re-use of a single `BackendTrafficPolicy`). When you have hundreds or more `xRoutes`, this becomes entirely unmanageable with `targetRefs` because defining each one individually is a pain. The `targets` field I added functions as a compromise between `targetRefs` and a Kubernetes `selector`, but it likely needs further refinement. Definitely looking for feedback/input here.


## Problem: some of this config does not make sense to configure at a route level or gateway level.
For example, it doesn't make any sense to configure gRPC health checks at the Gateway level since it requires knowing the service name, which will be different for each route. Likewise, it does not make sense to configure `observability.metrics.statsPrefix` at the route level since that is config that is for an entire listener. I think there are a couple of potential solutions here:

1. Keep things similar to the way I have outlined them, and explain via comments and documentation what the expected behaviour is
   I don't like this solution but it is one way of handling it

## Open Questions
- problem: some of this config does not make sense to configure at a route level or gateway level.
For example, it doesn't make any sense to configure gRPC health checks at the Gateway level since it requires knowing the service name, which will be different for each route. Likewise, it does not make sense to configure `observability.metrics.statsPrefix` at the route level since that is config that is for an entire listener. I think there are a couple of potential solutions here:
  - idea 1: Keep things similar to the way I have outlined them, and explain via comments and documentation what the expected behaviour is
     - I don't like this solution but it is one way of handling it
- This resource can be attached to a `Gateway`, but you can configure multiple listeners on a single `Gateway`. Realistically, this resource makes the most sense targeting listeners instead of forcing this to apply to the entire gateway. There should be a way to specify which listeners it applies to so that you can have different config for each listener.
  - After discussion with @arkodg in the Gateway meeting today, we agree that the best approach is likely to start with making this apply to the entire `Gateway` and then adding in support for @zhaohuabing's  [SectionName to PolicyTargetReference enhancement](https://github.com/kubernetes-sigs/gateway-api/pull/2283) when it lands and becomes available. 

## Edits

1. `cors` and `bypassAuth` configurations moved to the proposal for `AuthPolicy` resource.
2. initial design for `targets` switched to a single `targetRef` for now, but I do think we must solve for the use-case of applying an `BackendTrafficPolicy` to multiple resources in a flexible way for the resource to be truly useful.
3. Renamed `UpstreamTrafficPolicy` to `BackendTrafficPolicy` to reduce potential confusion about the traffic that this policy affects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: BackendTrafficPolicy #1821

Relates to

What is this?

What if Gateway API implements a GEP that delivers the same functionality as some of the fields of this resource?

What about traffic between EnvoyProxy and the downstream clients?

What happens when I attach it to a `Gateway` resource?

What happens when I attach it to an `xRoute` resource (`HTTPRoute`, `GRPCRoute`, etc.)

Can you use multiple `BackendTrafficPolicy` resources at the same time?

How will this be developed/implemented?

API outline

Things that I considered but felt might make better `xRoute` `externalRef` filters

Note that I explicitly chose to use the above `targets` field instead of the Gateway API standard of `targetRef`/`targetRefs`.

Problem: some of this config does not make sense to configure at a route level or gateway level.

Open Questions

Edits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: BackendTrafficPolicy #1821

Description

Relates to

What is this?

What if Gateway API implements a GEP that delivers the same functionality as some of the fields of this resource?

What about traffic between EnvoyProxy and the downstream clients?

What happens when I attach it to a Gateway resource?

What happens when I attach it to an xRoute resource (HTTPRoute, GRPCRoute, etc.)

Can you use multiple BackendTrafficPolicy resources at the same time?

How will this be developed/implemented?

API outline

Things that I considered but felt might make better xRoute externalRef filters

Note that I explicitly chose to use the above targets field instead of the Gateway API standard of targetRef/targetRefs.

Problem: some of this config does not make sense to configure at a route level or gateway level.

Open Questions

Edits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What happens when I attach it to a `Gateway` resource?

What happens when I attach it to an `xRoute` resource (`HTTPRoute`, `GRPCRoute`, etc.)

Can you use multiple `BackendTrafficPolicy` resources at the same time?

Things that I considered but felt might make better `xRoute` `externalRef` filters

Note that I explicitly chose to use the above `targets` field instead of the Gateway API standard of `targetRef`/`targetRefs`.