Enable containerd 2.0+ installation#315
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR updates the provisioning process for containerd installation by switching from the APT package to a binary download method, enabling support for containerd 2.0+ versions.
- Replaces APT-based installation with binary downloads from GitHub for containerd, runc, and CNI plugins.
- Updates the default containerd version from 1.6.27 to 2.0.4.
- Adjusts the container-toolkit installation to use a retry mechanism.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/provisioner/templates/containerd.go | Removes outdated APT installation and adds binary download, extraction, and service setup for containerd along with runc and CNI plugins installation. |
| examples/aws_kubeadm.yaml | Provides an updated example configuration selecting containerd as the runtime. |
| pkg/provisioner/templates/container-toolkit.go | Updates the installation method for the NVIDIA container toolkit to use a retry mechanism. |
| "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \ | ||
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null | ||
| with_retry 3 10s sudo apt-get update | ||
| # Download containerd 2.0 from GitHub |
There was a problem hiding this comment.
Nit: comment refers to a specific version.
|
|
||
| # Configure containerd and start service | ||
| mkdir -p /etc/containerd | ||
| echo "Downloading containerd from: $CONTAINERD_URL" |
There was a problem hiding this comment.
Nit: We could curl and pipe it into tar to not have to remove the file.
| rm -f ${CONTAINERD_TAR} | ||
|
|
||
| # Install runc | ||
| RUNC_VERSION="1.2.5" |
There was a problem hiding this comment.
This isn't really a dependency that I want to keep up to date. Is there a "latest"?
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null | ||
| with_retry 3 10s sudo apt-get update | ||
| # Download containerd 2.0 from GitHub | ||
| ARCH=$(dpkg --print-architecture) |
78a5ef9 to
bf62668
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR enables installation of containerd via binary download and manual setup, deprecating the APT package method.
- Replaces APT installation steps with manual download and extraction of containerd, runc, and CNI plugins
- Updates the default containerd version in the configuration
- Modifies the container-toolkit installation command and adds an AWS example configuration
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/provisioner/templates/containerd.go | Replaces APT installation with manual installation steps and updates default version |
| examples/aws_kubeadm.yaml | Introduces an example configuration for an AWS kubeadm environment |
| pkg/provisioner/templates/container-toolkit.go | Updates installation command to use the retry helper function |
Comments suppressed due to low confidence (1)
pkg/provisioner/templates/containerd.go:52
- Duplicate assignment of ARCH and redefinition of CONTAINERD_TAR and CONTAINERD_URL observed starting at line 52, which may lead to confusion regarding variable values.
ARCH=$(uname -m)
bf62668 to
7b351a8
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR enables installation of Containerd via binary downloads and manual setup, deprecating the old APT package method.
- Switches Containerd installation from APT to direct binary download with checksum verification.
- Adds installation steps for RUNC and CNI plugins with similar verification.
- Updates default containerd version in the provisioning script and adds an AWS kubeadm example.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/provisioner/templates/containerd.go | Replaces APT-based installation with binary downloads and adds checksum verification for containerd, runc, and CNI plugins. |
| examples/aws_kubeadm.yaml | Provides an example configuration for an AWS environment using kubeadm with containerd. |
| pkg/provisioner/templates/container-toolkit.go | Updates installation command to use a retry-based package installer for nvidia-container-toolkit. |
Comments suppressed due to low confidence (1)
pkg/provisioner/templates/containerd.go:66
- The tar extraction command uses an unconventional flag ordering (missing a leading dash) which may affect portability. Consider using a more standard format like 'sudo tar -xzvf - -C /usr/local'.
cat ${CONTAINERD_TAR} | sudo tar Cxzvf /usr/local -
7b351a8 to
d7f4136
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR enables installing containerd via binary download and manual setup, replacing the previous APT package method.
- Replaces APT-based containerd installation with dynamic binary downloads and checksum verification.
- Implements manual installation for containerd, runc, and CNI plugins with corresponding version fetching and verification.
- Updates the default containerd version in the constructor and revises the container-toolkit installation command.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/provisioner/templates/containerd.go | Updates installation process for containerd, runc, and CNI plugins; also updates default version. |
| examples/aws_kubeadm.yaml | Adds an example environment configuration YAML file. |
| pkg/provisioner/templates/container-toolkit.go | Revises installation command for nvidia-container-toolkit. |
d7f4136 to
8a71442
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR enables containerd installation via a binary download and manual setup while deprecating the APT package method, providing more flexibility to install versions not available in DEB repositories.
- Removed APT key and repository configuration for containerd and replaced it with GitHub-based binary downloads and checksum verifications.
- Updated default containerd version from 1.6.27 to 2.0.4 and added logic to fetch/install runc and CNI plugins.
- Modified the container toolkit installation command to use install_packages_with_retry.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/provisioner/templates/containerd.go | Swapped APT installation with binary download, extraction, and checksum verification logic. |
| examples/aws_kubeadm.yaml | Added an example environment configuration leveraging the new containerd setup. |
| pkg/provisioner/templates/container-toolkit.go | Updated the container toolkit installation command for consistency. |
|
|
||
| if env.Spec.ContainerRuntime.Version == "" { | ||
| version = "1.6.27" | ||
| version = "2.0.4" |
There was a problem hiding this comment.
Do we really want to change the default in this PR? This has implications on Operator testing for example where we may not be specifying the containerd version.
There was a problem hiding this comment.
this changes won't affect the Operator as it has the Holodeck version pinned https://github.com/NVIDIA/gpu-operator/blob/main/.github/workflows/ci.yaml#L337.
but if the container is a big change, I can leave the default version back to 1.6 or maybe 1.7.26 that is the latest of the 1.x.y series. WDYT?
There was a problem hiding this comment.
Let's change the default in a separate PR.
| with_retry 3 10s sudo apt-get update | ||
| # Fetch latest stable Containerd version from GitHub | ||
| echo "Fetching latest stable containerd version..." | ||
| CONTAINERD_VERSION=$(curl -fsSL https://api.github.com/repos/containerd/containerd/releases/latest | grep '"tag_name":' | cut -d '"' -f 4 | sed 's/v//') |
There was a problem hiding this comment.
Does this mean that we don't install the specified version?
There was a problem hiding this comment.
Umm good point, checking
There was a problem hiding this comment.
changes made to ensure that the specified version is the one used
| install_packages_with_retry nvidia-container-toolkit | ||
|
|
||
| # Configure container runtime | ||
| sudo nvidia-ctk runtime configure --runtime={{.ContainerRuntime}} --set-as-default |
There was a problem hiding this comment.
One thing that should be mentioned is that if we add --enable-cdi as an argument here, then we don't NEED containerd v2.0. Would it be simpler to expose that as an argument instead of reworking the containerd installation?
Not that I don't think having access to containerd is valuable, just that it would simplify unblocking DRA testing.
There was a problem hiding this comment.
Yeah it would be a much smaller and simpler PR, do you think we are too far from agreeing on this one?
8a71442 to
0999e70
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR enables containerd installation via binary download and manual setup to support versions not available on DEB repos.
- Removed installation via apt-get and added logic to fetch and verify containerd, runc, and CNI plugins from GitHub releases.
- Updated container toolkit installation to use retry logic.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/provisioner/templates/containerd.go | Transitioned from APT-based installation to binary download and checksum verification for containerd, runc, and CNI plugins. |
| examples/aws_kubeadm.yaml | Added an end-to-end test environment configuration example for AWS kubeadm. |
| pkg/provisioner/templates/container-toolkit.go | Modified installation of nvidia-container-toolkit to use a retry logic mechanism. |
Comments suppressed due to low confidence (1)
pkg/provisioner/templates/containerd.go:69
- [nitpick] Consider using tar's built-in -C flag instead of piping through cat for clarity and reliability; for example, 'sudo tar -xzvf ${CONTAINERD_TAR} -C /usr/local'.
cat ${CONTAINERD_TAR} | sudo tar Cxzvf /usr/local -
| image: | ||
| architecture: amd64 |
There was a problem hiding this comment.
Not critical to this PR: Is the architecture not determined by the instance type?
| # Stream directly into tar to avoid saving the archive | ||
| cat ${CONTAINERD_TAR} | sudo tar Cxzvf /usr/local - |
There was a problem hiding this comment.
We're already downloading the archive. This is only needed if we're using curl directly. Since we're verifying the SHA that's fine too.
Not a blocker though.
| echo "Runc ${RUNC_VERSION} installed successfully." | ||
|
|
||
| # Install CNI plugins | ||
| CNI_VERSION="1.1.1" |
There was a problem hiding this comment.
Any particular reason that we don't handle this version in the same way as the other packages?
|
|
||
| # Configure containerd | ||
| sudo mkdir -p /etc/containerd | ||
| containerd config default | sudo tee /etc/containerd/config.toml |
There was a problem hiding this comment.
Question: Does config default generate a "sane" config with respect to the SystemdCgroup?
| sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml | ||
|
|
||
| # Set up systemd service for containerd | ||
| sudo curl -fsSL "https://raw.githubusercontent.com/containerd/containerd/main/containerd.service" -o /etc/systemd/system/containerd.service |
There was a problem hiding this comment.
Should we not also download this from the versioned tag?
There was a problem hiding this comment.
(or is this not available in the tar file?)
0999e70 to
9552c50
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
9552c50 to
0017e76
Compare
This patch enabled installing containerd via binary download and manual setup, deprecating the APT (deb) package way.
This way we are more flexible and can install containerd versions not available on DEB repos
Installation Process Improvements:
pkg/provisioner/templates/containerd.go: Overhauled the containerd installation script to fetch the latest stable versions of containerd, runc, and CNI plugins, with added checksum verification for security.pkg/provisioner/templates/container-toolkit.go: Updated the installation command for the NVIDIA container toolkit to use theinstall_packages_with_retryfunction.