-
Notifications
You must be signed in to change notification settings - Fork 659
Large snapshot causes adding a new manager to fail #3113
Copy link
Copy link
Closed
moby/moby
#45664Description
A large snapshot (e.g. a few hundred MB) causes adding a new manager to fail.
We used to have a fix #2458 , however it is not enough. There's also a SendTimeout which seems to use hardcoded 2 seconds in sendProcessMessage in manager/state/raft/transport/peer.go.
This issue can be easily reproduced. Steps are as below:
- Create many large objects in swarm
for i in $(seq 1 500)
do
dd if=/dev/urandom bs=900k count=1 2>/dev/null | docker config create foo${i} -
done- Trigger snapshotting
docker swarm update --snapshot-interval 1
docker network create -d overlay dummy
docker network rm dummy
docker swarm update --snapshot-interval 10000- Verify the snapshot is big enough
/var/lib/docker/swarm/raft/snap-v3-encrypted:
-rw-r--r--. 1 root root 461774425 Jan 31 11:54 000000000000000b-000000000000042e.snap
- Add a new manager node.
You will see the dead loop in docker logs:
On the leader node:
Jan 31 11:57:50 centos7 dockerd[4644]: time="2023-01-31T11:57:50.651215634+08:00" level=error msg="error streaming message to peer" error=EOF
Jan 31 11:57:52 centos7 dockerd[4644]: time="2023-01-31T11:57:52.655983276+08:00" level=error msg="error streaming message to peer" error=EOF
Jan 31 11:57:54 centos7 dockerd[4644]: time="2023-01-31T11:57:54.660918294+08:00" level=error msg="error streaming message to peer" error=EOF
On the manager node that is newly added:
Jan 31 11:57:51 centos7-1 dockerd[1326]: time="2023-01-31T11:57:51.009851258+08:00" level=error msg="error while reading from stream" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jan 31 11:57:53 centos7-1 dockerd[1326]: time="2023-01-31T11:57:53.014080429+08:00" level=error msg="error while reading from stream" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jan 31 11:57:55 centos7-1 dockerd[1326]: time="2023-01-31T11:57:55.019443613+08:00" level=error msg="error while reading from stream" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels