Skip to content

Resolving down Swarm service from service with dns: "127.0.0.11" results in hundreds of errors per second in syslog #47716

@elyulka

Description

@elyulka

Description

I've setup haproxy to load balance services (following manuals set dns: "127.0.0.11" to do not forward requests to the external DSN servers) and noticed hundreds of errors per second in syslog when any backend service gets down:

Apr 14 15:51:08 staging-manager1 dockerd[2653338]: time="2024-04-14T15:51:08.083971294Z" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:35393" dns-server="udp:127.0.0.11:53" error="read udp 127.0.0.1:35393->127.0.0.11:53: i/o timeout" question=";tasks.mon_prometheus.\tIN\t A" spanID=0a662e24539c4e08 traceID=3e0e421519bb2e7dcc60adf180880fb7

How can I avoid log pollution without making load to the external DNS service with queries of down service?

Reproduce

  1. create stack file docker-compose.yml:
version: '3.8'
services:
  dnstest:
    image: nicolaka/netshoot:v0.12
    dns: 127.0.0.11
    entrypoint:
      - sh
      - -c
      - 'while :; do dig non-existing; sleep 1; done'
  1. deploy by running docker stack deploy -c docker-compose.yml dnstest
  2. examine syslog flooded by errors tail -n10 -f /var/log/syslog

Expected behavior

logs should not be filled with hundreds of errors quering down service when limiting dns resolvers to single 127.0.0.11

docker version

Client: Docker Engine - Community
 Version:           26.0.0
 API version:       1.45
 Go version:        go1.21.8
 Git commit:        2ae903e
 Built:             Wed Mar 20 15:17:48 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          26.0.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.8
  Git commit:       8b79278
  Built:            Wed Mar 20 15:17:48 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    26.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.13.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.2
    Path:     /root/.docker/cli-plugins/docker-compose

Server:
 Containers: 51
  Running: 35
  Paused: 0
  Stopped: 16
 Images: 92
 Server Version: 26.0.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: fluentd
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: active
  NodeID: ye1tqwlj7lag2sy839fce03ca
  Is Manager: true
  ClusterID: kwpn459kfqifaedft9c6naknp
  Managers: 1
  Nodes: 3
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.1.2
  Manager Addresses:
   192.168.1.2:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
  no-new-privileges
 Kernel Version: 5.15.0-101-generic
 Operating System: Ubuntu 22.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.82GiB
 Name: staging-manager1
 ID: c92b7ce2-fc57-487d-8b93-6b85847c857b
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://*****:5010/
 Live Restore Enabled: false

Additional Info

Initially I was on v25, upgrade to v26 did not help.
I've opened haproxy issue but it seems like it's some docker edge case.

root@0b11c7683e24:/# dig mon_prometheus
;; communications error to 127.0.0.11#53: timed out
;; communications error to 127.0.0.11#53: timed out
;; communications error to 127.0.0.11#53: timed out

; <<>> DiG 9.18.24-1-Debian <<>> mon_prometheus
;; global options: +cmd
;; no servers could be reached

Here is output of tcpdump -v -i lo udp:
tcpdump-any-port.txt

I tried to run nslookup without overriding dns and got

/ # nslookup  mon_prometheus
Server:		127.0.0.11
Address:	127.0.0.11:53

** server can't find mon_prometheus: NXDOMAIN

** server can't find mon_prometheus: NXDOMAIN

OS: Digitalocean image "Docker 25.0.3 on Ubuntu 22.04"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions