-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
Description
We've noticed that containerd has poor parallel image pull performance when multiple pods are started on a single node, all using the same image and when the image isn't already present on the node.
We compared this performance with Docker for the same pod spec, and received vastly better results.
Steps to reproduce the issue:
Using the following pod spec:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
namespace: default
spec:
replicas: 10
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
nodeName: <node name here>
containers:
- name: test
image: ubuntu:18.04
imagePullPolicy: Always
command:
- sleep
- "3600"
terminationGracePeriodSeconds: 0- Replace the node name with the node you would like to run the test on
- Create the deployment using
kubectl apply -f deploy.yaml - Observe the time it takes for all 10 pods to start running
- Repeat above on a docker node and with different replica counts
Describe the results you received:
Using the time from when the Kubelet receives the pod until the pod is running, we received the following results:
| Platform | Replicas | From cold | From warm |
|---|---|---|---|
| Containerd | 10 | 27 seconds | 2 seconds |
| Containerd | 1 | 4 seconds | 1 second |
| Docker | 10 | 5 seconds | 3 seconds |
| Docker | 1 | 4 seconds | 3 seconds |
From cold = image not already present on the node
From warm = image present on the node
The outlier in the above table is what we are concerned about - containerd's parallel image pulling performance isn't as fast as docker's.
I verified for each of the 10 pods in the slow, 27 second run that the image pulling was the bottleneck:
$ kubectl get events --sort-by='.metadata.creationTimestamp' -o 'go-template={{range .items}}{{.firstTimestamp}}{{"\t"}}{{.involvedObject.name}}{{"\t"}}{{.involvedObject.kind}}{{"\t"}}{{.message}}{{"\t"}}{{.reason}}{{"\t"}}{{.type}}{{"\n"}}{{end}}' | grep -i test-c867ffb59-4ql6l
2021-01-14T00:52:35Z test-c867ffb59 ReplicaSet (combined from similar events): Created pod: test-c867ffb59-4ql6l SuccessfulCreate Normal
2021-01-14T00:52:37Z test-c867ffb59-4ql6l Pod Pulling image "ubuntu:18.04" Pulling Normal
2021-01-14T00:53:02Z test-c867ffb59-4ql6l Pod Successfully pulled image "ubuntu:18.04" Pulled Normal
2021-01-14T00:53:02Z test-c867ffb59-4ql6l Pod Created container test Created Normal
2021-01-14T00:53:03Z test-c867ffb59-4ql6l Pod Started container test Started Normal
In the above case, the image pulling took 25 seconds.
Describe the results you expected:
We expect the parallel image pulling to be as performant as Docker.
Output of containerd --version:
containerd github.com/containerd/containerd v1.3.9 ea765aba0d05254012b0b9e595e995c09186427f
Any other relevant information:
I suspect in Docker's case, when there are multiple requests to pull the same image at the same time, the effort is de-duplicated. It appears for Containerd that this isn't the case.