Load balancer fallbacks to another cluster

<del>When using EDS to update the endpoint configuration dynamically, DNS resolution is not allowed for hostname in EDS response, as [the comment](https://github.com/envoyproxy/envoy/blob/1ca1d13941a14b8fa09ea839d6e936ec32e914b6/api/envoy/api/v2/endpoint/endpoint.proto#L29) says:
</del>

><del>//   The form of host address depends on the given cluster type. For STATIC or EDS,
  //   it is expected to be a direct IP address (or something resolvable by the
  //   specified :ref:`resolver <envoy_api_field_core.SocketAddress.resolver_name>`
  //   in the Address). For LOGICAL or STRICT DNS, it is expected to be hostname,
  //   and will be resolved via DNS.</del>

<del>and the DNS is now allowed for [customer resolver](https://github.com/envoyproxy/envoy/blob/1ca1d13941a14b8fa09ea839d6e936ec32e914b6/api/envoy/api/v2/core/address.proto#L56)</del>

>   <del>// The name of the custom resolver. This must have been registered with Envoy. If
  // this is empty, a context dependent default applies. If the address is a concrete
  // IP address, no resolution will occur. If address is a hostname this
  // should be set for resolution other than DNS. Specifying a custom resolver with
  // *STRICT_DNS* or *LOGICAL_DNS* will generate an error at runtime.</del>

<del>When adding some external services to the load balancing endpoints via EDS. It would be impractical to add the IP of the endpoints considering service VMs can and will go up and down, and the service's endpoint IP addresses will change frequently. Supporting hostname in EDS response seems to be a reasonable solution. </del>

<del>Not sure if not supporting DNS resolution for hostname is by design or the Envoy restriction. Please let me know and I can help working on this feature if Envoy needs it. </del>

In istio, we use priority filed to implement failover logic. When the endpoints in higher priority are down, the load balancer will select the endpoints with lower priority. It assumes that all the endpoints have the same settings (e.g. TLS context). But sometimes it may be different. For example, for external fallback service, mTLS is not required,  but inside the service mesh, mTLS is required. If the external service endpoints and internal service endpoints are added into one cluster. The traffic to external endpoint will be broken. 

Downstream Envoy setting:
```yaml
clusters:
  - name: proxy
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: proxy
      endpoints:
      - lb_endpoints:
        priority: 1
        - endpoint:
            address:
              socket_address: 
                address: proxy
                port_value: 80
      - lb_endpoints:
        priority: 0
        - endpoint:
            address:
              socket_address:
                address: proxy
                port_value: 443
    tls_context: {}
```

Upstream listener setting:
```yaml
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 80
      ...
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 443
      ...
      tls_context:
        common_tls_context:
          tls_certificates:
          - certificate_chain:
              filename: /etc/cert.pem
            private_key:
              filename: /etc/key.pem
          validation_context: {}
```
When the `proxy:443` is down, the traffic to `proxy:80` will be broken as well because `proxy:80` doesn't support mTLS. 

Thanks @PiotrSikora for the solution. Allow load balancer to fallback to another cluster would solve the problem. For above case, split the cluster configuration into two clusters and load balancer can select another cluster when one cluster is down and use the setting for the cluster.

cc @duderino @htuch @PiotrSikora @mattklein123 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load balancer fallbacks to another cluster #7454

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Load balancer fallbacks to another cluster #7454

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions