Skip to content

Poor parallel image pulling performance when cold #4937

@awprice

Description

@awprice

Description

We've noticed that containerd has poor parallel image pull performance when multiple pods are started on a single node, all using the same image and when the image isn't already present on the node.

We compared this performance with Docker for the same pod spec, and received vastly better results.

Steps to reproduce the issue:

Using the following pod spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  namespace: default
spec:
  replicas: 10
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      nodeName: <node name here>
      containers:
        - name: test
          image: ubuntu:18.04
          imagePullPolicy: Always
          command:
            - sleep
            - "3600"
      terminationGracePeriodSeconds: 0
  1. Replace the node name with the node you would like to run the test on
  2. Create the deployment using kubectl apply -f deploy.yaml
  3. Observe the time it takes for all 10 pods to start running
  4. Repeat above on a docker node and with different replica counts

Describe the results you received:

Using the time from when the Kubelet receives the pod until the pod is running, we received the following results:

Platform Replicas From cold From warm
Containerd 10 27 seconds 2 seconds
Containerd 1 4 seconds 1 second
Docker 10 5 seconds 3 seconds
Docker 1 4 seconds 3 seconds

From cold = image not already present on the node
From warm = image present on the node

The outlier in the above table is what we are concerned about - containerd's parallel image pulling performance isn't as fast as docker's.

I verified for each of the 10 pods in the slow, 27 second run that the image pulling was the bottleneck:

$ kubectl get events  --sort-by='.metadata.creationTimestamp'  -o 'go-template={{range .items}}{{.firstTimestamp}}{{"\t"}}{{.involvedObject.name}}{{"\t"}}{{.involvedObject.kind}}{{"\t"}}{{.message}}{{"\t"}}{{.reason}}{{"\t"}}{{.type}}{{"\n"}}{{end}}' | grep -i test-c867ffb59-4ql6l
2021-01-14T00:52:35Z	test-c867ffb59	ReplicaSet	(combined from similar events): Created pod: test-c867ffb59-4ql6l	SuccessfulCreate	Normal
2021-01-14T00:52:37Z	test-c867ffb59-4ql6l	Pod	Pulling image "ubuntu:18.04"	Pulling	Normal
2021-01-14T00:53:02Z	test-c867ffb59-4ql6l	Pod	Successfully pulled image "ubuntu:18.04"	Pulled	Normal
2021-01-14T00:53:02Z	test-c867ffb59-4ql6l	Pod	Created container test	Created	Normal
2021-01-14T00:53:03Z	test-c867ffb59-4ql6l	Pod	Started container test	Started	Normal

In the above case, the image pulling took 25 seconds.

Describe the results you expected:

We expect the parallel image pulling to be as performant as Docker.

Output of containerd --version:

containerd github.com/containerd/containerd v1.3.9 ea765aba0d05254012b0b9e595e995c09186427f

Any other relevant information:

I suspect in Docker's case, when there are multiple requests to pull the same image at the same time, the effort is de-duplicated. It appears for Containerd that this isn't the case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions