Skip to content

single manager docker swarm stuck in Down state after reboot #34827

@alexanderkjeldaas

Description

@alexanderkjeldaas

Description

A single-manager swarm cluster gets into the following state after a abrupt reboot:

# docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
ux3enftr6krhlrp8bbodo2q1l *   wsdocker6           Down                Active              Leader

The manager can't leave the swarm, as the docker swarm leave --force command times out.

The above state shouldn't be possible to be in. By printing the above, the node must be a manager, and since there's only one manager, it has to infer that itself is that manager, and it thus can't be Down.

Steps to reproduce the issue:

  1. Create a swarm with 1 manager
  2. Abruptly reboot
  3. Profit

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

 docker version
Client:
 Version:      17.06.2-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 20:00:17 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.2-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 19:59:11 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

root@wsdocker6:/etc/systemd/system# docker info
Containers: 83
 Running: 1
 Paused: 0
 Stopped: 82
Images: 103
Server Version: 17.06.2-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: pending
 NodeID: ux3enftr6krhlrp8bbodo2q1l
 Is Manager: true
 ClusterID: nhkjyh6xo2jdu2txkjt95xfua
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: x.x.x.x
 Manager Addresses:
  x.x.x.x:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-93-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.63GiB
Name: wsdocker6
ID: KRY5:PQU2:2KPP:7QR6:4EXD:QFXT:KKVB:CZVN:OQUU:S5A3:Z5ZT:XQGN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: xxx
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/swarmkind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.version/17.06

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions