Skip to content

Memory leak in dynamic forward proxy mode w/ DNS Cache #30999

@rbtz-openai

Description

@rbtz-openai

Description:
We have an envoy in a dynamic forward proxy mode. It usually behaves well, but sometimes it gets into a state where it tries to allocate 10s MiB/s and eventually gets OOM'ed:
Screenshot 2023-11-21 at 11 22 01 AM

Full heap profiles show that most of the memory is inside DnsCacheImpl (it should be limited to 10000 entries in theory though).

As a side note, I've also noticed that envoy tries to resolve internal search paths (*.svc.cluster.local.), even though no_default_search_domain is set to true.

Heap dumps
Diff heap profile:
profile001

Raw profiles:
https://gist.github.com/rbtz-openai/13d35aea14013f12273c6aa7478184cb

Admin and Stats Output:

 "version": "b5ca88acee3453c9459474b8f22215796eff4dde/1.28.0/Clean/RELEASE/BoringSSL",

There are a lot of cluster due to dynamic forward proxy mode:

# curl -s localhost:XXX/clusters | fgrep -c hostname
6752
$ curl -s localhost:XXX/stats/prometheus | fgrep dns
# TYPE envoy_dns_cares_get_addr_failure counter
envoy_dns_cares_get_addr_failure{} 310
# TYPE envoy_dns_cares_not_found counter
envoy_dns_cares_not_found{} 178
# TYPE envoy_dns_cares_resolve_total counter
envoy_dns_cares_resolve_total{} 186188
# TYPE envoy_dns_cares_timeouts counter
envoy_dns_cares_timeouts{} 32
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_cache_load counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_cache_load{} 0
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_attempt counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_attempt{} 184881
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_failure counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_failure{} 667
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_success counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_success{} 184213
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_timeout counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_query_timeout{} 365
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_rq_pending_overflow counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_dns_rq_pending_overflow{} 0
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_host_added counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_host_added{} 33253
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_host_address_changed counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_host_address_changed{} 90554
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_host_overflow counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_host_overflow{} 0
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_host_removed counter
envoy_dns_cache_dynamic_forward_proxy_cache_config_host_removed{} 26501
# TYPE envoy_dns_cares_pending_resolutions gauge
envoy_dns_cares_pending_resolutions{} 1
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_circuit_breakers_rq_pending_open gauge
envoy_dns_cache_dynamic_forward_proxy_cache_config_circuit_breakers_rq_pending_open{} 0
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_circuit_breakers_rq_pending_remaining gauge
envoy_dns_cache_dynamic_forward_proxy_cache_config_circuit_breakers_rq_pending_remaining{} 1024
# TYPE envoy_dns_cache_dynamic_forward_proxy_cache_config_num_hosts gauge
envoy_dns_cache_dynamic_forward_proxy_cache_config_num_hosts{} 6752

Config:

The interesting part of the config is the dynamic forward proxy with DNS Cache (same configuration in http_filters)

    cluster_type:
      name: envoy.clusters.dynamic_forward_proxy
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
        dns_cache_config:
          name: dynamic_forward_proxy_cache_config
          max_hosts: 10000
          dns_lookup_family: V4_ONLY
          typed_dns_resolver_config:
            name: envoy.network.dns_resolver.cares
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.network.dns_resolver.cares.v3.CaresDnsResolverConfig
              resolvers:
                - socket_address:
                    address: 8.8.8.8
                    port_value: 53
                - socket_address:
                    address: 1.1.1.1
                    port_value: 53
                - socket_address:
                    address: 8.8.4.4
                    port_value: 53
                - socket_address:
                    address: 1.0.0.1
                    port_value: 53
              dns_resolver_options:
                use_tcp_for_dns_lookups: true
                # There is no need to use the default search domain when resolving external requests
                no_default_search_domain: true

cc: @alyssawilk @euroelessar

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions