Skip to content

Conversation

@adrianreber
Copy link
Contributor

@adrianreber adrianreber commented Jun 19, 2024

This implements container restore as described in:

https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/#restore-checkpointed-container-standalone

For detailed step by step instruction also see contrib/checkpoint/checkpoint-restore-cri-test.sh

The code changes are based on changes I have done in Podman around 2018 and CRI-O around 2020.

The history behind restoring container via CRI/Kubernetes probably requires some explanation. The initial proposal to bring checkpoint/restore to Kubernetes was looking at pod checkpoint and restoring and the corresponding CRI changes.

kubernetes-sigs/cri-tools#662 kubernetes/kubernetes#97194

After discussing this topic for about two years another approach was implemented as described in KEP-2008:

kubernetes/enhancements#2008

"Forensic Container Checkpointing" allowed us to separate checkpointing from restoring. For the "Forensic Container Checkpointing" it is enough to create a checkpoint of the container. Restoring is not necessary as the analysis of the checkpoint archive can happen without restoring the container.

While thinking about a way to restore a container it was by coincidence that we started to look into restoring containers in Kubernetes via Create and Start. The way it was done in CRI-O is to figure out during Create if the container image is a checkpoint image and if that is true we are using another code path. The same was implemented now with this change in containerd.

With this change it is possible to restore the container from a checkpoint tar archive that is created during checkpointing via CRI.

To restore a container via Kubernetes we convert the tar archive to an OCI image as described in the kubernetes.io blog post from above. Using this OCI image it is possible to restore a container in Kubernetes.

At this point I think it should be doable to restore containers in CRI-O and containerd no matter if they have been created by containerd or CRI-O. The biggest difference is the container metadata and that can be adapted during restore.

Open items:

  • It is not clear to me why restoring a container in containerd goes through task/Create(). But as the restore code already exists this change extended the existing code path to restore a container in task/Create() to also restore a container through the CRI via Create and Start.
  • Automatic image pulling. containerd does not pull images automatically if created via the CRI. There is an option in crictl to pull images before starting, but that uses the CRI image pull interface. It is still a separate pull and create operation. Restoring containers from an OCI image is a bit different. The checkpoint OCI image does not include the base image, but just a reference to the image (NAME@DIGEST). Using crictl with pulling will enable the pulling of the checkpoint image, but not of the base image the checkpoint is based on. So during preparation of the checkpoint containerd will automatically pull the base image, but I was not able how to pull an image blockingly in containerd. So there is a for loop waiting for the container image to appear in the internal store. I think this probably can be implemented better.

Anyway, this is a first step towards container restored in Kubernetes when using containerd.

@k8s-ci-robot
Copy link

Hi @adrianreber. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@xhejtman
Copy link

xhejtman commented Jul 5, 2024

Hello, I tested in K8s with the following:

Setup Env

K8s: 1.30.2 (RKE2)
Distro: Ubuntu 24.04
Kernel: 6.8.0-36
CRIU: v3.18-266-g6f92787b7
containerd: 2.0.0-rc3 + this PR
kubelet featuregate: ContainerCheckpoint=true

Results

Checkpoint

curl -q -s -X POST "https://localhost:10250/checkpoint/$CONTAINER" \
        --insecure \
        --cert /var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt \
        --key /var/lib/rancher/rke2/server/tls/client-kube-apiserver.key

creates a checkpoint, however, criu (/etc/criu/runc.conf) must NOT set custom log path (e.g. log-file=/tmp/criu.log), or checkpoint will fail (rpc error: code = Unknown desc = open /var/lib/rancher/rke2/agent/containerd/io.containerd.grpc.v1.cri/containers/161976044213c3871a13f8989c11933e240f0735da61bd5cf83694a94b1a1af6/dump.log: no such file or directory)

buildah needs to set different annotation to the one with crio:

  • for crio: --annotation=io.kubernetes.cri-o.annotations.checkpoint.name=
  • for containerd: --annotation=org.criu.checkpoint.container.name=

Restore

  • cgroupv1 support only. with cgroupv2, restored process is killed immediately
  • rootfs-diff.tar is not restored - restore actually fails in criu
  • container after restore does not log (i.e., log output is empty)
  • Restoring simple shell command works (/bin/bash -c i=0; while true; do sleep 1; echo $i; i=$[i+1]; done)

@adrianreber adrianreber force-pushed the 2024-06-19-restore-create-start branch from f602c12 to 25da2f2 Compare July 10, 2024 16:00
@adrianreber
Copy link
Contributor Author

@xhejtman Thanks for testing. I rebased the PR to apply cleanly on the latest git checkout of containerd.

The CRIU log file error you mentioned should be gone now. I do not think you need to explicitly set the feature gate for container checkpointing since it defaults to on since going Beta in 1.30.

@xhejtman
Copy link

The CRIU log file error you mentioned should be gone now. I do not think you need to explicitly set the feature gate for container checkpointing since it defaults to on since going Beta in 1.30.

Yes, tested again, the CRIU log issue is gone.

I just noticed one more thing: if checkpointed container did not specify command/entrypoint in the docker file, Pod manifest for restore needs to specify at least dummy command, or error will be raised: no command specified in restore process.

@adrianreber adrianreber force-pushed the 2024-06-19-restore-create-start branch from 25da2f2 to 29ec7b2 Compare July 15, 2024 15:50
@adrianreber
Copy link
Contributor Author

@xhejtman I added my Kubernetes test script in contrib/checkpoint and container logging should now also work.

@adrianreber adrianreber force-pushed the 2024-06-19-restore-create-start branch 5 times, most recently from 6fef265 to 1d3793e Compare July 16, 2024 16:43
@adrianreber
Copy link
Contributor Author

CI looks finally happy. One more feature is missing before this is ready. The rootfs changes are not yet applied to the restored container, so a container which changes files in the container will probably fail restoring. This should be an easy change as it is not much more than applying the existing tar file to the container rootfs.

@adrianreber adrianreber force-pushed the 2024-06-19-restore-create-start branch from 1d3793e to 23e2f4b Compare July 18, 2024 07:40
@adrianreber adrianreber reopened this Jul 18, 2024
@adrianreber adrianreber marked this pull request as ready for review July 18, 2024 09:21
@adrianreber adrianreber reopened this Jul 18, 2024
@dosubot dosubot bot added the area/cri Container Runtime Interface (CRI) label Jul 18, 2024
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but a few nits.

Sorry for taking too long to review this

@adrianreber adrianreber force-pushed the 2024-06-19-restore-create-start branch from f5eee87 to 16e8428 Compare March 11, 2025 08:30
@adrianreber
Copy link
Contributor Author

LGTM but a few nits.

Sorry for taking too long to review this

Thanks. I tried to address are your review comments.

@adrianreber adrianreber force-pushed the 2024-06-19-restore-create-start branch from 16e8428 to 8ac7dd4 Compare March 11, 2025 10:47
This implements container restore as described in:

https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/#restore-checkpointed-container-standalone

For detailed step by step instruction also see contrib/checkpoint/checkpoint-restore-cri-test.sh

The code changes are based on changes I have done in Podman around 2018
and CRI-O around 2020.

The history behind restoring container via CRI/Kubernetes probably
requires some explanation. The initial proposal to bring
checkpoint/restore to Kubernetes was looking at pod checkpoint and
restoring and the corresponding CRI changes.

kubernetes-sigs/cri-tools#662
kubernetes/kubernetes#97194

After discussing this topic for about two years another approach was
implemented as described in KEP-2008:

kubernetes/enhancements#2008

"Forensic Container Checkpointing" allowed us to separate checkpointing
from restoring. For the "Forensic Container Checkpointing" it is enough
to create a checkpoint of the container. Restoring is not necessary as
the analysis of the checkpoint archive can happen without restoring the
container.

While thinking about a way to restore a container it was by coincidence
that we started to look into restoring containers in Kubernetes via
Create and Start. The way it was done in CRI-O is to figure out during
Create if the container image is a checkpoint image and if that is true
we are using another code path. The same was implemented now with this
change in containerd.

With this change it is possible to restore the container from a
checkpoint tar archive that is created during checkpointing via CRI.

To restore a container via Kubernetes we convert the tar archive to an
OCI image as described in the kubernetes.io blog post from above. Using
this OCI image it is possible to restore a container in Kubernetes.

At this point I think it should be doable to restore containers in
CRI-O and containerd no matter if they have been created by containerd or
CRI-O. The biggest difference is the container metadata and that can
be adapted during restore.

Open items:

 * It is not clear to me why restoring a container in containerd goes
   through task/Create(). But as the restore code already exists this
   change extended the existing code path to restore a container in
   task/Create() to also restore a container through the CRI via
   Create and Start.
 * Automatic image pulling. containerd does not pull images
   automatically if created via the CRI. There is an option in
   crictl to pull images before starting, but that uses the CRI
   image pull interface. It is still a separate pull and create
   operation. Restoring containers from an OCI image is a bit
   different. The checkpoint OCI image does not include the base
   image, but just a reference to the image (NAME@DIGEST).
   Using crictl with pulling will enable the pulling of the
   checkpoint image, but not of the base image the checkpoint is
   based on. So during preparation of the checkpoint containerd
   will automatically pull the base image, but I was not able how
   to pull an image blockingly in containerd. So there is a for
   loop waiting for the container image to appear in the internal
   store. I think this probably can be implemented better.

Anyway, this is a first step towards container restored in Kubernetes
when using containerd.

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber adrianreber force-pushed the 2024-06-19-restore-create-start branch from 8ac7dd4 to 9e6beaf Compare March 11, 2025 11:55
@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Mar 11, 2025
@estesp estesp added this pull request to the merge queue Mar 11, 2025
@AkihiroSuda AkihiroSuda added this to the 2.1 milestone Mar 11, 2025
Merged via the queue into containerd:main with commit 9b552d4 Mar 11, 2025
56 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Mar 11, 2025
@dmcgowan dmcgowan removed the area/criu checkpoint/resume label Apr 25, 2025
echo "$ctr_id"
rm -f "$RESTORE_JSON" "$RESTORE_POD_JSON"
echo -n "--> Start container from checkpoint: "
crictl start "$ctr_id"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question. Without testing whether the files and process status are the same as before, it may be impossible to determine if the restore was successful. It is recommended to start a process that increments by 1 per second in the original container, modify some files within it, and then compare the restored container with the original container to see if they are identical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the question - we check the logs of the container after restore:

	lines_after=$(crictl logs "$ctr_id" | wc -l)
	if [ "$lines_before" -ge "$lines_after" ]; then
		echo "number of lines after checkpointing ($lines_after) " \
			"should be larger than before checkpointing ($lines_before)"
		false
	fi

mansikulkarni96 added a commit to mansikulkarni96/containerd that referenced this pull request Dec 4, 2025
containerd 2.1.0

Welcome to the v2.1.0 release of containerd!

The first minor release of containerd 2.x focuses on continued stability alongside
new features and improvements. This is the first time-based released for containerd.
Most the feature set and core functionality has long been stable and harderened in production
environments, so now we transition to a balance of timely delivery of new functionality
with the same high confidence in stability and performance.

* Add no_sync option to boost boltDB performance on ephemeral environments ([containerd#10745](containerd#10745))
* Add content create event ([containerd#11006](containerd#11006))
* Erofs snapshotter and differ ([containerd#10705](containerd#10705))

* Update CRI to use transfer service for image pull by default ([containerd#8515](containerd#8515))
* Support multiple cni plugin bin dirs ([containerd#11311](containerd#11311))
* Support container restore through CRI/Kubernetes ([containerd#10365](containerd#10365))
* Add OCI/Image Volume Source support ([containerd#10579](containerd#10579))
* Enable Writable cgroups for unprivileged containers ([containerd#11131](containerd#11131))
* Fix recursive RLock() mutex acquisition ([containerd/go-cni#126](containerd/go-cni#126))
* Support CNI STATUS Verb ([containerd/go-cni#123](containerd/go-cni#123))

* Retry last registry host on 50x responses ([containerd#11484](containerd#11484))
* Multipart layer fetch ([containerd#10177](containerd#10177))
* Enable HTTP debug and trace for transfer based puller ([containerd#10762](containerd#10762))
* Add support for unpacking custom media types  ([containerd#11744](containerd#11744))
* Add dial timeout field to hosts toml configuration ([containerd#11106](containerd#11106))

* Expose Pod assigned IPs to NRI plugins ([containerd#10921](containerd#10921))

* Support multiple uid/gid mappings ([containerd#10722](containerd#10722))
* Fix race between serve and immediate shutdown on the server ([containerd/ttrpc#175](containerd/ttrpc#175))

* Update FreeBSD defaults and re-organize platform defaults ([containerd#11017](containerd#11017))

* Postpone cri config deprecations to v2.2 ([containerd#11684](containerd#11684))
* Remove deprecated dynamic library plugins ([containerd#11683](containerd#11683))
* Remove the support for Schema 1 images ([containerd#11681](containerd#11681))

Please try out the release binaries and report any issues at
https://github.com/containerd/containerd/issues.

* Derek McGowan
* Phil Estes
* Akihiro Suda
* Maksym Pavlenko
* Jin Dong
* Wei Fu
* Sebastiaan van Stijn
* Samuel Karp
* Mike Brown
* Adrien Delorme
* Austin Vazquez
* Akhil Mohan
* Kazuyoshi Kato
* Henry Wang
* Gao Xiang
* ningmingxiao
* Krisztian Litkey
* Yang Yang
* Archit Kulkarni
* Chris Henzie
* Iceber Gu
* Alexey Lunev
* Antonio Ojea
* Davanum Srinivas
* Marat Radchenko
* Michael Zappa
* Paweł Gronowski
* Rodrigo Campos
* Alberto Garcia Hierro
* Amit Barve
* Andrey Smirnov
* Divya
* Etienne Champetier
* Kirtana Ashok
* Philip Laine
* QiPing Wan
* fengwei0328
* zounengren
* Adrian Reber
* Alfred Wingate
* Amal Thundiyil
* Athos Ribeiro
* Brian Goff
* Cesar Talledo
* ChengyuZhu6
* Chongyi Zheng
* Craig Ingram
* Danny Canter
* David Son
* Fupan Li
* HirazawaUi
* Jing Xu
* Jonathan A. Sternberg
* Jose Fernandez
* Kaita Nakamura
* Kohei Tokunaga
* Lei Liu
* Marco Visin
* Mike Baynton
* Qiyuan Liang
* Sameer
* Shiming Zhang
* Swagat Bora
* Teresaliu
* Tony Fang
* Tõnis Tiigi
* Vered Rosen
* Vinayak Goyal
* bo.jiang
* chriskery
* luchenhan
* mahmut
* zhaixiaojuan

* **github.com/Microsoft/hcsshim**                                                 v0.12.9 -> v0.13.0-rc.3
* **github.com/cilium/ebpf**                                                       v0.11.0 -> v0.16.0
* **github.com/containerd/cgroups/v3**                                             v3.0.3 -> v3.0.5
* **github.com/containerd/containerd/api**                                         v1.8.0 -> v1.9.0
* **github.com/containerd/continuity**                                             v0.4.4 -> v0.4.5
* **github.com/containerd/go-cni**                                                 v1.1.10 -> v1.1.12
* **github.com/containerd/imgcrypt/v2**                                            v2.0.0-rc.1 -> v2.0.1
* **github.com/containerd/otelttrpc**                                              ea5083fda723 -> v0.1.0
* **github.com/containerd/platforms**                                              v1.0.0-rc.0 -> v1.0.0-rc.1
* **github.com/containerd/ttrpc**                                                  v1.2.6 -> v1.2.7
* **github.com/containerd/typeurl/v2**                                             v2.2.2 -> v2.2.3
* **github.com/containernetworking/cni**                                           v1.2.3 -> v1.3.0
* **github.com/containernetworking/plugins**                                       v1.5.1 -> v1.7.1
* **github.com/containers/ocicrypt**                                               v1.2.0 -> v1.2.1
* **github.com/davecgh/go-spew**                                                   d8f796af33cc -> v1.1.1
* **github.com/fsnotify/fsnotify**                                                 v1.7.0 -> v1.9.0
* **github.com/go-jose/go-jose/v4**                                                v4.0.4 -> v4.0.5
* **github.com/google/go-cmp**                                                     v0.6.0 -> v0.7.0
* **github.com/grpc-ecosystem/grpc-gateway/v2**                                    v2.22.0 -> v2.26.1
* **github.com/klauspost/compress**                                                v1.17.11 -> v1.18.0
* **github.com/mdlayher/socket**                                                   v0.4.1 -> v0.5.1
* **github.com/moby/spdystream**                                                   v0.4.0 -> v0.5.0
* **github.com/moby/sys/user**                                                     v0.3.0 -> v0.4.0
* **github.com/opencontainers/image-spec**                                         v1.1.0 -> v1.1.1
* **github.com/opencontainers/runtime-spec**                                       v1.2.0 -> v1.2.1
* **github.com/opencontainers/selinux**                                            v1.11.1 -> v1.12.0
* **github.com/pelletier/go-toml/v2**                                              v2.2.3 -> v2.2.4
* **github.com/petermattis/goid**                                                  4fcff4a6cae7 **_new_**
* **github.com/pmezard/go-difflib**                                                5d4384ee4fb2 -> v1.0.0
* **github.com/prometheus/client_golang**                                          v1.20.5 -> v1.22.0
* **github.com/prometheus/common**                                                 v0.55.0 -> v0.62.0
* **github.com/sasha-s/go-deadlock**                                               v0.3.5 **_new_**
* **github.com/smallstep/pkcs7**                                                   v0.1.1 **_new_**
* **github.com/stretchr/testify**                                                  v1.9.0 -> v1.10.0
* **github.com/tchap/go-patricia/v2**                                              v2.3.1 -> v2.3.2
* **github.com/urfave/cli/v2**                                                     v2.27.5 -> v2.27.6
* **github.com/vishvananda/netlink**                                               v1.3.0 -> 0e7078ed04c8
* **github.com/vishvananda/netns**                                                 v0.0.4 -> v0.0.5
* **go.etcd.io/bbolt**                                                             v1.3.11 -> v1.4.0
* **go.opentelemetry.io/auto/sdk**                                                 v1.1.0 **_new_**
* **go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc**  v0.56.0 -> v0.60.0
* **go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp**                v0.56.0 -> v0.60.0
* **go.opentelemetry.io/otel**                                                     v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/exporters/otlp/otlptrace**                            v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc**              v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp**              v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/metric**                                              v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/sdk**                                                 v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/trace**                                               v1.31.0 -> v1.35.0
* **go.opentelemetry.io/proto/otlp**                                               v1.3.1 -> v1.5.0
* **golang.org/x/crypto**                                                          v0.28.0 -> v0.36.0
* **golang.org/x/exp**                                                             aacd6d4b4611 -> 2d47ceb2692f
* **golang.org/x/mod**                                                             v0.21.0 -> v0.24.0
* **golang.org/x/net**                                                             v0.30.0 -> v0.38.0
* **golang.org/x/oauth2**                                                          v0.22.0 -> v0.27.0
* **golang.org/x/sync**                                                            v0.8.0 -> v0.14.0
* **golang.org/x/sys**                                                             v0.26.0 -> v0.33.0
* **golang.org/x/term**                                                            v0.25.0 -> v0.30.0
* **golang.org/x/text**                                                            v0.19.0 -> v0.23.0
* **golang.org/x/time**                                                            v0.3.0 -> v0.7.0
* **google.golang.org/genproto/googleapis/api**                                    5fefd90f89a9 -> 56aae31c358a
* **google.golang.org/genproto/googleapis/rpc**                                    324edc3d5d38 -> 56aae31c358a
* **google.golang.org/grpc**                                                       v1.67.1 -> v1.72.0
* **google.golang.org/protobuf**                                                   v1.35.1 -> v1.36.6
* **k8s.io/api**                                                                   v0.31.2 -> v0.32.3
* **k8s.io/apimachinery**                                                          v0.31.2 -> v0.32.3
* **k8s.io/apiserver**                                                             v0.31.2 -> v0.32.3
* **k8s.io/client-go**                                                             v0.31.2 -> v0.32.3
* **k8s.io/cri-api**                                                               v0.31.2 -> v0.32.3
* **k8s.io/kubelet**                                                               v0.31.2 -> v0.32.3
* **k8s.io/utils**                                                                 18e509b52bc8 -> 3ea5e8cea738
* **sigs.k8s.io/json**                                                             bc3834ca7abd -> 9aa6b5e7a4b3
* **sigs.k8s.io/structured-merge-diff/v4**                                         v4.4.1 -> v4.4.2
* **tags.cncf.io/container-device-interface**                                      v0.8.0 -> v1.0.1
* **tags.cncf.io/container-device-interface/specs-go**                             v0.8.0 -> v1.0.0

Previous release can be found at [v2.0.0](https://github.com/containerd/containerd/releases/tag/v2.0.0)
* `containerd-<VERSION>-<OS>-<ARCH>.tar.gz`:         ✅Recommended. Dynamically linked with glibc 2.35 (Ubuntu 22.04).
* `containerd-static-<VERSION>-<OS>-<ARCH>.tar.gz`:  Statically linked. Expected to be used on Linux distributions that do not use glibc >= 2.35. Not position-independent.

In addition to containerd, typically you will have to install [runc](https://github.com/opencontainers/runc/releases)
and [CNI plugins](https://github.com/containernetworking/plugins/releases) from their official sites too.

See also the [Getting Started](https://github.com/containerd/containerd/blob/main/docs/getting-started.md) documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.