Poor parallel image pulling performance when cold

**Description**

We've noticed that containerd has poor parallel image pull performance when multiple pods are started on a single node, all using the same image and when the image isn't already present on the node.

We compared this performance with Docker for the same pod spec, and received vastly better results.

**Steps to reproduce the issue:**

Using the following pod spec:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  namespace: default
spec:
  replicas: 10
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      nodeName: <node name here>
      containers:
        - name: test
          image: ubuntu:18.04
          imagePullPolicy: Always
          command:
            - sleep
            - "3600"
      terminationGracePeriodSeconds: 0
```

1. Replace the node name with the node you would like to run the test on
2. Create the deployment using `kubectl apply -f deploy.yaml`
3. Observe the time it takes for all 10 pods to start running
4. Repeat above on a docker node and with different replica counts

**Describe the results you received:**

Using the time from when the Kubelet receives the pod until the pod is running, we received the following results:

| Platform  | Replicas | From cold | From warm |
| ------------- | ------------- | ------------- | ------------- |
| Containerd  | 10 | <ins>**27 seconds**</ins> | 2 seconds |
| Containerd  | 1  | 4 seconds | 1 second |
| Docker  | 10  | 5 seconds | 3 seconds |
| Docker  | 1  | 4 seconds | 3 seconds |

From cold = image not already present on the node
From warm = image present on the node

The outlier in the above table is what we are concerned about - containerd's parallel image pulling performance isn't as fast as docker's.

I verified for each of the 10 pods in the slow, 27 second run that the image pulling was the bottleneck:

```
$ kubectl get events  --sort-by='.metadata.creationTimestamp'  -o 'go-template={{range .items}}{{.firstTimestamp}}{{"\t"}}{{.involvedObject.name}}{{"\t"}}{{.involvedObject.kind}}{{"\t"}}{{.message}}{{"\t"}}{{.reason}}{{"\t"}}{{.type}}{{"\n"}}{{end}}' | grep -i test-c867ffb59-4ql6l
2021-01-14T00:52:35Z	test-c867ffb59	ReplicaSet	(combined from similar events): Created pod: test-c867ffb59-4ql6l	SuccessfulCreate	Normal
2021-01-14T00:52:37Z	test-c867ffb59-4ql6l	Pod	Pulling image "ubuntu:18.04"	Pulling	Normal
2021-01-14T00:53:02Z	test-c867ffb59-4ql6l	Pod	Successfully pulled image "ubuntu:18.04"	Pulled	Normal
2021-01-14T00:53:02Z	test-c867ffb59-4ql6l	Pod	Created container test	Created	Normal
2021-01-14T00:53:03Z	test-c867ffb59-4ql6l	Pod	Started container test	Started	Normal
```
In the above case, the image pulling took 25 seconds.

**Describe the results you expected:**

We expect the parallel image pulling to be as performant as Docker. 

**Output of `containerd --version`:**

```
containerd github.com/containerd/containerd v1.3.9 ea765aba0d05254012b0b9e595e995c09186427f
```

**Any other relevant information:**

I suspect in Docker's case, when there are multiple requests to pull the same image at the same time, the effort is de-duplicated. It appears for Containerd that this isn't the case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor parallel image pulling performance when cold #4937

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Platform	Replicas	From cold	From warm
Containerd	10	27 seconds	2 seconds
Containerd	1	4 seconds	1 second
Docker	10	5 seconds	3 seconds
Docker	1	4 seconds	3 seconds

Poor parallel image pulling performance when cold #4937

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions