We've noticed inside our org after adding respect_dns_ttl to all clusters globally, that when Envoy finds a host that results in an NXDOMAIN on an attempted DNS lookup, Envoy appears to continue to retry the lookup infinitely, to excess.
I think that the following config is sufficient to reproduce the issue:
static_resources:
clusters:
- name: test
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
respect_dns_ttl: true
load_assignment:
cluster_name: test
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: does-not-exist-asdkjashdaskfhasalsfhas.com
port_value: 8080
I think this logic starts somewhere here
and I think it would be good if this fell back on the default DNS refresh rate in the case of an NXDOMAIN or similar error resulting in a lack of a TTL