-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Labels
area/swarmkind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.version/1.12
Milestone
Description
Output of docker version:
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 17:52:38 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 17:52:38 2016
OS/Arch: linux/amd64
Output of docker info:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 1.12.1
Storage Driver: overlay2
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge overlay
Swarm: pending
NodeID:
Is Manager: false
Node Address: 10.199.50.146
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.21-rancher
Operating System: RancherOS v0.7.0
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67 GiB
Name: ip-10-199-50-146.aws.company.com
ID: LQ7Z:2BN3:4DXF:KL5R:BHIF:7HQE:BVTT:PIXQ:XCD7:UF3B:56OJ:UOKR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 15
Goroutines: 35
System Time: 2016-11-04T10:53:42.936383693Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.):
Docker swarm on three AWS ec2 instances, running rancherOS v0.7.0
Additional Information:
We create a swarm on three machines as they start, using an init script (the first one to run creates the swarm and gets to be master, the other two join as workers). Most of the time it is stable for a while, but eventually the master panics and dies:
time="2016-11-03T17:33:17.911332106Z" level=debug msg="Assigning addresses for endpoint gateway_ingress-sbox's interface on network docker_gwbridge"
time="2016-11-03T17:33:17.911355293Z" level=debug msg="RequestAddress(LocalDefault/172.18.0.0/16, <nil>, map[])"
time="2016-11-03T17:33:17.918615099Z" level=debug msg="Assigning addresses for endpoint gateway_ingress-sbox's interface on network docker_gwbridge"
time="2016-11-03T17:33:17.965977986Z" level=debug msg="Programming external connectivity on endpoint gateway_ingress-sbox (7747f460e03b8686321f24bd5f9814d9ff49d2708bab2f07406343a43314a6d8)"
time="2016-11-03T17:35:13.957511582Z" level=debug msg="Calling GET /version"
time="2016-11-03T17:35:13.960010580Z" level=debug msg="Calling GET /v1.24/swarm"
time="2016-11-03T17:36:37.059151036Z" level=debug msg="Calling GET /v1.24/version"
time="2016-11-03T17:36:58.207552031Z" level=debug msg="Calling GET /version"
time="2016-11-03T17:36:58.209370108Z" level=debug msg="Calling GET /v1.24/swarm"
panic: runtime error: index out of range
goroutine 435 [running]:
panic(0x1b1e1c0, 0xc820012030)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/docker/swarmkit/manager/keymanager.(*KeyManager).rotateKey(0xc820ee63c0, 0x7fd97453ddf0, 0xc82032fa80, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/src/github.com/docker/swarmkit/manager/keymanager/keymanager.go:139 +0xd29
github.com/docker/swarmkit/manager/keymanager.(*KeyManager).Run(0xc820ee63c0, 0x7fd97453ddf0, 0xc82032fa80, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/src/github.com/docker/swarmkit/manager/keymanager/keymanager.go:217 +0x7ae
github.com/docker/swarmkit/manager.(*Manager).Run.func2.2(0x7fd97453ddf0, 0xc82032fa80, 0xc820ee63c0)
/go/src/github.com/docker/docker/vendor/src/github.com/docker/swarmkit/manager/manager.go:358 +0x4a
created by github.com/docker/swarmkit/manager.(*Manager).Run.func2
/go/src/github.com/docker/docker/vendor/src/github.com/docker/swarmkit/manager/manager.go:361 +0x14b9
This puts it into a cycle of trying to restart and panicking again (though with a slightly different error after the first time):
time="2016-11-04T05:33:21.725650259Z" level=debug msg="libcontainerd: containerd connection state change: READY"
time="2016-11-04T05:33:25.081305327Z" level=info msg="3da1eb410c6175ff is starting a new election at term 2"
time="2016-11-04T05:33:25.081441634Z" level=info msg="3da1eb410c6175ff became candidate at term 3"
time="2016-11-04T05:33:25.081462656Z" level=info msg="3da1eb410c6175ff received vote from 3da1eb410c6175ff at term 3"
time="2016-11-04T05:33:25.081492789Z" level=info msg="3da1eb410c6175ff became leader at term 3"
time="2016-11-04T05:33:25.081507133Z" level=info msg="raft.node: 3da1eb410c6175ff elected leader 3da1eb410c6175ff at term 3"
time="2016-11-04T05:33:25.083329347Z" level=error msg="agent: session failed" error="session initiation timed out" module=agent
time="2016-11-04T05:33:25.083472100Z" level=debug msg="agent: rebuild session" module=agent
panic: runtime error: index out of range
goroutine 188 [running]:
panic(0x1b1e1c0, 0xc820012030)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/docker/swarmkit/manager/keymanager.(*KeyManager).Run(0xc820ab2e00, 0x7fe020534e20, 0xc8208c34c0, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/src/github.com/docker/swarmkit/manager/keymanager/keymanager.go:191 +0x9f5
github.com/docker/swarmkit/manager.(*Manager).Run.func2.2(0x7fe020534e20, 0xc8208c34c0, 0xc820ab2e00)
/go/src/github.com/docker/docker/vendor/src/github.com/docker/swarmkit/manager/manager.go:358 +0x4a
created by github.com/docker/swarmkit/manager.(*Manager).Run.func2
/go/src/github.com/docker/docker/vendor/src/github.com/docker/swarmkit/manager/manager.go:361 +0x14b9
(full debug log docker.log.txt)
I fully expect this is something strange we've done to it, but I haven't been able to figure out what...
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/swarmkind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.version/1.12