Bug Description
I've been building a proof-of-concept multi-cluster mesh with multi-primaries in different networks. In my case, one cluster is AWS EKS and another is DigitalOcean managed Kubernetes. The east-west gateway in the EKS cluster is exposed with an AWS NLB.
After linking the clusters as per the documentation, I've discovered that the domain name of the AWS NLB in the EKS cluster is not correctly resolved in the DO cluster. It turned out that the upstream DNS set up on DO KS nodes is returning REFUSED answers to ANY queries that are currently used in the Pilot code:
|
// TODO figure out how to query only A + AAAA |
|
res := n.client.Query(new(dns.Msg).SetQuestion(dns.Fqdn(name), dns.TypeANY)) |
$ dig -t ANY k8s-istiomul-istioeas-4d501f177f-9a9de7682aacbcd6.elb.us-west-2.amazonaws.com
; <<>> DiG 9.16.1-Ubuntu <<>> -t ANY k8s-istiomul-istioeas-4d501f177f-9a9de7682aacbcd6.elb.us-west-2.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 44741
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: c367afe288a65985 (echoed)
;; QUESTION SECTION:
;k8s-istiomul-istioeas-4d501f177f-9a9de7682aacbcd6.elb.us-west-2.amazonaws.com. IN ANY
;; Query time: 7 msec
;; SERVER: 10.245.0.10#53(10.245.0.10)
;; WHEN: Mon May 02 03:20:07 UTC 2022
;; MSG SIZE rcvd: 118
ANY queries are not guaranteed to be consistently implemented in DNS servers. For example, Cloudflare deems them deprecated and their NS return NOTIMP to ANY queries.
I'd suggest replacing ANY with A and AAAA queries, as mentioned by the comment in the code. Though it is technically possible to craft a multi-type query with the library currently in use, such queries also seem not guaranteed to be implemented consistently, so we'd likely have to make two separate queries and merge the results. I have a patch tested in my environment and can follow up with a PR.
Version
$ istioctl version
client version: 1.13.3
control plane version: 1.13.3
data plane version: 1.13.3 (1 proxies)
$ kubectl version --short
Client Version: v1.23.6
Server Version: v1.22.8
Additional Information
No response
Affected product area
Is this the right place to submit this?
Bug Description
I've been building a proof-of-concept multi-cluster mesh with multi-primaries in different networks. In my case, one cluster is AWS EKS and another is DigitalOcean managed Kubernetes. The east-west gateway in the EKS cluster is exposed with an AWS NLB.
After linking the clusters as per the documentation, I've discovered that the domain name of the AWS NLB in the EKS cluster is not correctly resolved in the DO cluster. It turned out that the upstream DNS set up on DO KS nodes is returning
REFUSEDanswers toANYqueries that are currently used in the Pilot code:istio/pilot/pkg/model/network.go
Lines 485 to 486 in e67c34b
ANYqueries are not guaranteed to be consistently implemented in DNS servers. For example, Cloudflare deems them deprecated and their NS returnNOTIMPtoANYqueries.I'd suggest replacing
ANYwithAandAAAAqueries, as mentioned by the comment in the code. Though it is technically possible to craft a multi-type query with the library currently in use, such queries also seem not guaranteed to be implemented consistently, so we'd likely have to make two separate queries and merge the results. I have a patch tested in my environment and can follow up with a PR.Version
Additional Information
No response
Affected product area
Is this the right place to submit this?