Skip to content

Enable containerd 2.0+ installation#315

Merged
ArangoGutierrez merged 3 commits intoNVIDIA:mainfrom
ArangoGutierrez:containerd20
May 23, 2025
Merged

Enable containerd 2.0+ installation#315
ArangoGutierrez merged 3 commits intoNVIDIA:mainfrom
ArangoGutierrez:containerd20

Conversation

@ArangoGutierrez
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez commented Mar 18, 2025

This patch enabled installing containerd via binary download and manual setup, deprecating the APT (deb) package way.
This way we are more flexible and can install containerd versions not available on DEB repos

Installation Process Improvements:

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the provisioning process for containerd installation by switching from the APT package to a binary download method, enabling support for containerd 2.0+ versions.

  • Replaces APT-based installation with binary downloads from GitHub for containerd, runc, and CNI plugins.
  • Updates the default containerd version from 1.6.27 to 2.0.4.
  • Adjusts the container-toolkit installation to use a retry mechanism.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pkg/provisioner/templates/containerd.go Removes outdated APT installation and adds binary download, extraction, and service setup for containerd along with runc and CNI plugins installation.
examples/aws_kubeadm.yaml Provides an updated example configuration selecting containerd as the runtime.
pkg/provisioner/templates/container-toolkit.go Updates the installation method for the NVIDIA container toolkit to use a retry mechanism.

"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
with_retry 3 10s sudo apt-get update
# Download containerd 2.0 from GitHub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comment refers to a specific version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Configure containerd and start service
mkdir -p /etc/containerd
echo "Downloading containerd from: $CONTAINERD_URL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We could curl and pipe it into tar to not have to remove the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

rm -f ${CONTAINERD_TAR}

# Install runc
RUNC_VERSION="1.2.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really a dependency that I want to keep up to date. Is there a "latest"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
with_retry 3 10s sudo apt-get update
# Download containerd 2.0 from GitHub
ARCH=$(dpkg --print-architecture)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not uname?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables installation of containerd via binary download and manual setup, deprecating the APT package method.

  • Replaces APT installation steps with manual download and extraction of containerd, runc, and CNI plugins
  • Updates the default containerd version in the configuration
  • Modifies the container-toolkit installation command and adds an AWS example configuration

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
pkg/provisioner/templates/containerd.go Replaces APT installation with manual installation steps and updates default version
examples/aws_kubeadm.yaml Introduces an example configuration for an AWS kubeadm environment
pkg/provisioner/templates/container-toolkit.go Updates installation command to use the retry helper function
Comments suppressed due to low confidence (1)

pkg/provisioner/templates/containerd.go:52

  • Duplicate assignment of ARCH and redefinition of CONTAINERD_TAR and CONTAINERD_URL observed starting at line 52, which may lead to confusion regarding variable values.
ARCH=$(uname -m)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables installation of Containerd via binary downloads and manual setup, deprecating the old APT package method.

  • Switches Containerd installation from APT to direct binary download with checksum verification.
  • Adds installation steps for RUNC and CNI plugins with similar verification.
  • Updates default containerd version in the provisioning script and adds an AWS kubeadm example.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
pkg/provisioner/templates/containerd.go Replaces APT-based installation with binary downloads and adds checksum verification for containerd, runc, and CNI plugins.
examples/aws_kubeadm.yaml Provides an example configuration for an AWS environment using kubeadm with containerd.
pkg/provisioner/templates/container-toolkit.go Updates installation command to use a retry-based package installer for nvidia-container-toolkit.
Comments suppressed due to low confidence (1)

pkg/provisioner/templates/containerd.go:66

  • The tar extraction command uses an unconventional flag ordering (missing a leading dash) which may affect portability. Consider using a more standard format like 'sudo tar -xzvf - -C /usr/local'.
cat ${CONTAINERD_TAR} | sudo tar Cxzvf /usr/local -

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables installing containerd via binary download and manual setup, replacing the previous APT package method.

  • Replaces APT-based containerd installation with dynamic binary downloads and checksum verification.
  • Implements manual installation for containerd, runc, and CNI plugins with corresponding version fetching and verification.
  • Updates the default containerd version in the constructor and revises the container-toolkit installation command.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
pkg/provisioner/templates/containerd.go Updates installation process for containerd, runc, and CNI plugins; also updates default version.
examples/aws_kubeadm.yaml Adds an example environment configuration YAML file.
pkg/provisioner/templates/container-toolkit.go Revises installation command for nvidia-container-toolkit.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables containerd installation via a binary download and manual setup while deprecating the APT package method, providing more flexibility to install versions not available in DEB repositories.

  • Removed APT key and repository configuration for containerd and replaced it with GitHub-based binary downloads and checksum verifications.
  • Updated default containerd version from 1.6.27 to 2.0.4 and added logic to fetch/install runc and CNI plugins.
  • Modified the container toolkit installation command to use install_packages_with_retry.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/provisioner/templates/containerd.go Swapped APT installation with binary download, extraction, and checksum verification logic.
examples/aws_kubeadm.yaml Added an example environment configuration leveraging the new containerd setup.
pkg/provisioner/templates/container-toolkit.go Updated the container toolkit installation command for consistency.


if env.Spec.ContainerRuntime.Version == "" {
version = "1.6.27"
version = "2.0.4"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to change the default in this PR? This has implications on Operator testing for example where we may not be specifying the containerd version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes won't affect the Operator as it has the Holodeck version pinned https://github.com/NVIDIA/gpu-operator/blob/main/.github/workflows/ci.yaml#L337.
but if the container is a big change, I can leave the default version back to 1.6 or maybe 1.7.26 that is the latest of the 1.x.y series. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the default in a separate PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

with_retry 3 10s sudo apt-get update
# Fetch latest stable Containerd version from GitHub
echo "Fetching latest stable containerd version..."
CONTAINERD_VERSION=$(curl -fsSL https://api.github.com/repos/containerd/containerd/releases/latest | grep '"tag_name":' | cut -d '"' -f 4 | sed 's/v//')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we don't install the specified version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm good point, checking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you confirm this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes made to ensure that the specified version is the one used

@ArangoGutierrez ArangoGutierrez requested a review from elezar March 20, 2025 12:44
install_packages_with_retry nvidia-container-toolkit

# Configure container runtime
sudo nvidia-ctk runtime configure --runtime={{.ContainerRuntime}} --set-as-default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that should be mentioned is that if we add --enable-cdi as an argument here, then we don't NEED containerd v2.0. Would it be simpler to expose that as an argument instead of reworking the containerd installation?

Not that I don't think having access to containerd is valuable, just that it would simplify unblocking DRA testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would be a much smaller and simpler PR, do you think we are too far from agreeing on this one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created #319
I will leave this PR for a further release, I'll cut a release once we merge #319

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables containerd installation via binary download and manual setup to support versions not available on DEB repos.

  • Removed installation via apt-get and added logic to fetch and verify containerd, runc, and CNI plugins from GitHub releases.
  • Updated container toolkit installation to use retry logic.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/provisioner/templates/containerd.go Transitioned from APT-based installation to binary download and checksum verification for containerd, runc, and CNI plugins.
examples/aws_kubeadm.yaml Added an end-to-end test environment configuration example for AWS kubeadm.
pkg/provisioner/templates/container-toolkit.go Modified installation of nvidia-container-toolkit to use a retry logic mechanism.
Comments suppressed due to low confidence (1)

pkg/provisioner/templates/containerd.go:69

  • [nitpick] Consider using tar's built-in -C flag instead of piping through cat for clarity and reliability; for example, 'sudo tar -xzvf ${CONTAINERD_TAR} -C /usr/local'.
cat ${CONTAINERD_TAR} | sudo tar Cxzvf /usr/local -

@ArangoGutierrez ArangoGutierrez added this to the v0.2.8 milestone Mar 20, 2025
Comment on lines +16 to +17
image:
architecture: amd64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical to this PR: Is the architecture not determined by the instance type?

Comment on lines +68 to +69
# Stream directly into tar to avoid saving the archive
cat ${CONTAINERD_TAR} | sudo tar Cxzvf /usr/local -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already downloading the archive. This is only needed if we're using curl directly. Since we're verifying the SHA that's fine too.

Not a blocker though.

echo "Runc ${RUNC_VERSION} installed successfully."

# Install CNI plugins
CNI_VERSION="1.1.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason that we don't handle this version in the same way as the other packages?


# Configure containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does config default generate a "sane" config with respect to the SystemdCgroup?

sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml

# Set up systemd service for containerd
sudo curl -fsSL "https://raw.githubusercontent.com/containerd/containerd/main/containerd.service" -o /etc/systemd/system/containerd.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not also download this from the versioned tag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or is this not available in the tar file?)

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@ArangoGutierrez ArangoGutierrez merged commit 765912e into NVIDIA:main May 23, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants