Skip to content

Deadlock when firewalld is reloaded while the bridge driver is initializing #50619

@taamrooz

Description

@taamrooz

Description

In some scenarios when starting Docker, it hangs. Looking at the stacktrace/code it seems to be because of a firewalld reload signal happening while interacting with the network, e.g. creating the network. This seems to require a specific timing though, as it isn't easily reproducible for me.

This behaviour was only seen when testing Docker v28.3.0, this did not happen on v25.0.3.

Reproduce

Start Docker (either through something like systemd or manually).
Have a firewalld reload signal happen at a specific moment in the starting process of Docker.

Expected behavior

A firewalld signal should not cause a deadlock.

docker version

Client:
 Version:           28.3.0
 API version:       1.51
 Go version:        go1.24.4
 Git commit:        7cbee73f19
 Built:             Wed Jun 25 15:21:12 2025
 OS/Arch:           linux/arm64
 Context:           default

Server:
 Engine:
  Version:          28.3.0
  API version:      1.51 (minimum version 1.24)
  Go version:       go1.24.4
  Git commit:       e0183475e03cd05b6a560d8b22fe0a83cd1cba14
  Built:            Wed Jun 25 15:21:12 2025
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          v1.6.19.m
  GitCommit:        1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f.m
 runc:
  Version:          1.1.4+dev
  GitCommit:        v1.1.4-8-g974efd2d-dirty
 docker-init:
  Version:          0.19.0
  GitCommit:        b9f42a0-dirty

docker info

Client:
 Version:    28.3.0
 Context:    default
 Debug Mode: false

Server:
 Containers: 5
  Running: 1
  Paused: 0
  Stopped: 4
 Images: 3
 Server Version: 28.3.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: local
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f.m
 runc version: v1.1.4-8-g974efd2d-dirty
 init version: b9f42a0-dirty
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.10
 Operating System: Linux
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 3.808GiB
 Name: linux
 ID: 0eafc496-9bc5-44db-81a5-cc182dec6fca
 Docker Root Dir: /mnt/data/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

Stacktraces:

goroutine-stacks-2025-08-01T120221Z.log
goroutine-stacks-2025-08-01T100329Z.log

Log output when it hangs:

Jul 31 14:55:05 linux systemd[1]: Starting Docker Application Container Engine...
Jul 31 14:55:06 linux dockerd[1631]: time="2025-07-31T14:55:06.140864671Z" level=info msg="Starting up"
Jul 31 14:55:06 linux dockerd[1631]: time="2025-07-31T14:55:06.154453714Z" level=info msg="OTEL tracing is not configured, using no-op tracer provider"
Jul 31 14:55:06 linux dockerd[1631]: time="2025-07-31T14:55:06.158254423Z" level=info msg="CDI directory does not exist, skipping: failed to monitor for changes: no such file or directory" dir=/etc/cdi
Jul 31 14:55:06 linux dockerd[1631]: time="2025-07-31T14:55:06.159166756Z" level=info msg="CDI directory does not exist, skipping: failed to monitor for changes: no such file or directory" dir=/var/run/cdi
Jul 31 14:55:06 linux dockerd[1631]: time="2025-07-31T14:55:06.334055941Z" level=info msg="Creating a containerd client" address=/run/containerd/containerd.sock timeout=1m0s
Jul 31 14:55:06 linux dockerd[1631]: time="2025-07-31T14:55:06.459883870Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Jul 31 14:55:06 linux dockerd[1631]: time="2025-07-31T14:55:06.750103316Z" level=info msg="Loading containers: start."
Jul 31 14:55:07 linux dockerd[1631]: time="2025-07-31T14:55:07.050480638Z" level=info msg="Firewalld: docker zone already exists, returning"
Jul 31 14:55:07 linux dockerd[1631]: time="2025-07-31T14:55:07.222686488Z" level=info msg="Firewalld: created docker-forwarding policy"
Jul 31 14:55:55 linux dockerd[1631]: time="2025-07-31T14:55:55.451563672Z" level=warning msg="ip6tables is enabled, but cannot set up ip6tables chains" error="failed to create NAT chain DOCKER: COMMAND_FAILED: '/usr/sbin/ip6tables -w10 -t nat -N DOCKER' failed: ip6tables v1.8.7 (legacy): can't initialize ip6tables table `nat': Table does not exist (do you need to insmod?)\nPerhaps ip6tables or your kernel needs to be upgraded.\n"
Jul 31 14:56:00 linux dockerd[1631]: time="2025-07-31T14:56:00.923698847Z" level=warning msg="xtables contention detected while running [-t nat -C POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE]: Waited for 5.45 seconds and received \"\""
Jul 31 14:56:06 linux dockerd[1631]: time="2025-07-31T14:56:06.645981255Z" level=info msg="Firewalld: interface docker0 already part of docker zone, returning"
Jul 31 14:56:07 linux dockerd[1631]: time="2025-07-31T14:56:07.835572750Z" level=info msg="Firewalld: interface br-69fd3e1c511d already part of docker zone, returning"
Jul 31 14:56:10 linux dockerd[1631]: time="2025-07-31T14:56:10.158445691Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint_count 5baa07ec1b39f5bf07ab37c0dfc4d7b1dd00eedad7b49e19cf16c33a12f75eb2], retrying...."
Jul 31 14:56:12 linux dockerd[1631]: time="2025-07-31T14:56:12.290034864Z" level=warning msg="error locating sandbox id dd49945884cc039c38301f46c9c408d31743d60abd17fb02af968516a5bec23f: sandbox dd49945884cc039c38301f46c9c408d31743d60abd17fb02af968516a5bec23f not found"
Jul 31 14:56:12 linux dockerd[1631]: time="2025-07-31T14:56:12.290280739Z" level=warning msg="error locating sandbox id 44716c8fa10846ac9d11e0dd7e8532bb46bca3cfc5391165344d5a4ac377fcaa: sandbox 44716c8fa10846ac9d11e0dd7e8532bb46bca3cfc5391165344d5a4ac377fcaa not found"
Jul 31 14:56:12 linux dockerd[1631]: time="2025-07-31T14:56:12.290455156Z" level=warning msg="error locating sandbox id 66fb10b2d92bdee9c897fe831271ed7d0488932eef853f23ed48eb5b976e2b11: sandbox 66fb10b2d92bdee9c897fe831271ed7d0488932eef853f23ed48eb5b976e2b11 not found"
Jul 31 14:56:12 linux dockerd[1631]: time="2025-07-31T14:56:12.290564822Z" level=warning msg="error locating sandbox id 03f32cbfe23b218f4fb479b75d4d4c7374cc784ac5387337acc1ecd1ef72594e: sandbox 03f32cbfe23b218f4fb479b75d4d4c7374cc784ac5387337acc1ecd1ef72594e not found"
Jul 31 14:56:12 linux dockerd[1631]: time="2025-07-31T14:56:12.290650572Z" level=warning msg="error locating sandbox id 62926769418eb640a25665ae294d91f769ef2ab7a289d8107c43ec732f49c4df: sandbox 62926769418eb640a25665ae294d91f769ef2ab7a289d8107c43ec732f49c4df not found"
Jul 31 14:56:22 linux dockerd[1631]: time="2025-07-31T14:56:22.438066259Z" level=warning msg="xtables contention detected while running [-t raw -C PREROUTING -p tcp -d 172.18.0.2 --dport 80 ! -i br-69fd3e1c511d -j DROP]: Waited for 7.22 seconds and received \"\""
Jul 31 14:56:27 linux dockerd[1631]: time="2025-07-31T14:56:27.704230371Z" level=info msg="ignoring event" container=06dfdb4afa56eb7c61db33d2ba61669aff176be829df48b5faebb69abc05f4b3 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

Metadata

Metadata

Assignees

Type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions