Skip to content

Load balancer fallbacks to another cluster #7454

@yxue

Description

@yxue

When using EDS to update the endpoint configuration dynamically, DNS resolution is not allowed for hostname in EDS response, as the comment says:

// The form of host address depends on the given cluster type. For STATIC or EDS,
// it is expected to be a direct IP address (or something resolvable by the
// specified :ref:resolver <envoy_api_field_core.SocketAddress.resolver_name>
// in the Address). For LOGICAL or STRICT DNS, it is expected to be hostname,
// and will be resolved via DNS.

and the DNS is now allowed for customer resolver

// The name of the custom resolver. This must have been registered with Envoy. If
// this is empty, a context dependent default applies. If the address is a concrete
// IP address, no resolution will occur. If address is a hostname this
// should be set for resolution other than DNS. Specifying a custom resolver with
// STRICT_DNS or LOGICAL_DNS will generate an error at runtime.

When adding some external services to the load balancing endpoints via EDS. It would be impractical to add the IP of the endpoints considering service VMs can and will go up and down, and the service's endpoint IP addresses will change frequently. Supporting hostname in EDS response seems to be a reasonable solution.

Not sure if not supporting DNS resolution for hostname is by design or the Envoy restriction. Please let me know and I can help working on this feature if Envoy needs it.

In istio, we use priority filed to implement failover logic. When the endpoints in higher priority are down, the load balancer will select the endpoints with lower priority. It assumes that all the endpoints have the same settings (e.g. TLS context). But sometimes it may be different. For example, for external fallback service, mTLS is not required, but inside the service mesh, mTLS is required. If the external service endpoints and internal service endpoints are added into one cluster. The traffic to external endpoint will be broken.

Downstream Envoy setting:

clusters:
  - name: proxy
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: proxy
      endpoints:
      - lb_endpoints:
        priority: 1
        - endpoint:
            address:
              socket_address: 
                address: proxy
                port_value: 80
      - lb_endpoints:
        priority: 0
        - endpoint:
            address:
              socket_address:
                address: proxy
                port_value: 443
    tls_context: {}

Upstream listener setting:

  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 80
      ...
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 443
      ...
      tls_context:
        common_tls_context:
          tls_certificates:
          - certificate_chain:
              filename: /etc/cert.pem
            private_key:
              filename: /etc/key.pem
          validation_context: {}

When the proxy:443 is down, the traffic to proxy:80 will be broken as well because proxy:80 doesn't support mTLS.

Thanks @PiotrSikora for the solution. Allow load balancer to fallback to another cluster would solve the problem. For above case, split the cluster configuration into two clusters and load balancer can select another cluster when one cluster is down and use the setting for the cluster.

cc @duderino @htuch @PiotrSikora @mattklein123

Metadata

Metadata

Assignees

No one assigned

    Labels

    design proposalNeeds design doc/proposal before implementationno stalebotDisables stalebot from closing an issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions