Skip to content

DNS lookup AUTO settings doesn't fallback to V4 when a single CNAME entry is returned #2634

@glicht

Description

@glicht

Title: DNS lookup AUTO settings doesn't fallback to V4 when a single CNAME entry is returned

Description:
I am just getting started with envoy, so I might be wrong here and missing something.

When working with a cluster that has a host address with a single CNAME entry returned for IPv6 lookups, envoy will not fallback to do a lookup using IPv4. From looking at the code at dns_impl.cc it looks like envoy assumes that if a success code (ARES_SUCCESS) is returned from getHostByName then it is assumed that an ip address was returned. This is not necessarily the case. A DNS request may contain only a CNAME entry with no IP. For example this happens to me with s3.amazonaws.com. Setting V4_ONLY solves the problem. Here is the dig output for IPv6 and IPv4:

$ dig s3.amazonaws.com AAAA

; <<>> DiG 9.10.3-P4-Ubuntu <<>> s3.amazonaws.com AAAA
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34567
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; MBZ: 0005 , udp: 4096
;; QUESTION SECTION:
;s3.amazonaws.com.              IN      AAAA

;; ANSWER SECTION:
s3.amazonaws.com.       5       IN      CNAME   s3-1.amazonaws.com.

;; AUTHORITY SECTION:
s3-1.amazonaws.com.     5       IN      SOA     ns-1726.awsdns-23.co.uk. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86

Repro steps:
Use following envoy.yaml file:

admin:
  access_log_path: "admin_access.log"
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901
static_resources:
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          codec_type: auto          
          idle_timeout: 300s
          access_log:
          - name: envoy.file_access_log
            config:
              path: "egress_http.log"
          stat_prefix: egress_http
          http_protocol_options: 
            allow_absolute_url: true
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: service_s3
          http_filters: 
            - name: envoy.router
              config: {}          
  clusters:
  - name: service_s3
    type: logical_dns
    lb_policy: round_robin                                                                                                                                                
    connect_timeout: 1s
    http_protocol_options: {}    
    # dns_lookup_family: V4_ONLY    
    hosts:
    - socket_address:
        address: s3.amazonaws.com      
        port_value: 443
    tls_context: 
      sni: "s3.amazonaws.com"
    

Run envoy and then use curl to test out a simple request from s3:

curl -v --proxy http://localhost:10000 http://s3.amazonaws.com/my-test-bucket22/
*   Trying ::1...
* connect to ::1 port 10000 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 10000 (#0)
> GET http://s3.amazonaws.com/my-test-bucket22/ HTTP/1.1
> Host: s3.amazonaws.com
> User-Agent: curl/7.47.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 503 Service Unavailable
< content-length: 19
< content-type: text/plain
< date: Fri, 16 Feb 2018 15:41:24 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
no healthy upstream

Work around: un-comment the line: dns_lookup_family: V4_ONLY in the yaml file. Then curl will succeed:

 curl -v --proxy http://localhost:10000 http://s3.amazonaws.com/my-test-bucket22/
*   Trying ::1...
* connect to ::1 port 10000 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 10000 (#0)
> GET http://s3.amazonaws.com/my-test-bucket22/ HTTP/1.1
> Host: s3.amazonaws.com
> User-Agent: curl/7.47.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< x-amz-id-2: fnwlOvun/jf7YAeI7eMlYm50XlhMoeXzeYu3tfCXY1SbZGvFChRrD+zo1vDFS3s0eoLqhyl9a64=
< x-amz-request-id: 302CF4548DD2CF9A
< date: Fri, 16 Feb 2018 15:46:58 GMT
< x-amz-bucket-region: us-east-1
< content-type: application/xml
< server: envoy
< x-envoy-upstream-service-time: 184
< transfer-encoding: chunked
<
<?xml version="1.0" encoding="UTF-8"?>
* Connection #0 to host localhost left intact
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>my-test-bucket22</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>test.txt</Key><LastModified>2018-02-16T15:23:08.000Z</LastModified><ETag>&quot;0b26e313ed4a7ca6904b0e9369e5b957&quot;</ETag><Size>19</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

Would be happy to submit a PR to fix this. Just want to be sure I am not missing something basic before moving forward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions