Skip to content

feat: drop nvidia-device-plugin feature#60

Merged
arnaldo2792 merged 1 commit intobottlerocket-os:developfrom
arnaldo2792:nvidia-settings-api
Sep 10, 2024
Merged

feat: drop nvidia-device-plugin feature#60
arnaldo2792 merged 1 commit intobottlerocket-os:developfrom
arnaldo2792:nvidia-settings-api

Conversation

@arnaldo2792
Copy link
Copy Markdown
Contributor

@arnaldo2792 arnaldo2792 commented Sep 6, 2024

Issue #, if available:
N / A

Description of changes:

Use a setings extension instead of a cargo feature for the kubelet device plugins API. See #57 (comment) for details.

Testing:

As part of bottlerocket-os/bottlerocket-core-kit#132 and bottlerocket-os/bottlerocket#4182

  1. Instance joined the cluster:
NAME                                           STATUS   ROLES    AGE    VERSION
ip-XXXX.us-west-2.compute.internal   Ready    <none>   22s    v1.30.1-eks-e564799
  1. Files were generated using the new values
bash-5.1# apiclient get settings.kubelet-device-plugin
{
  "settings": {
    "kubelet-device-plugin": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "volume-mounts",
        "pass-device-specs": true
      }
    }
  }
}

bash-5.1# cat /etc/systemd/system/nvidia-k8s-device-plugin.service.d/exec-start.conf
[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-device-plugin --config-file=/etc/nvidia-k8s-device-plugin/settings.yaml
bash-5.1# cat /etc/nvidia-container-runtime/config.toml
### generated from the template file ###
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false

[nvidia-container-cli]
root = "/"
path = "/usr/bin/nvidia-container-cli"
environment = []
ldconfig = "@/sbin/ldconfig"
bash-5.1#

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Copy Markdown
Contributor

@ginglis13 ginglis13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits. Quick question - I noticed in reading your testing in the description, why are we setting two ExecStart=, the first of which is empty?

https://github.com/bottlerocket-os/bottlerocket-core-kit/blob/4b528d3ac4c9574befc053dc3f64c095234819c4/packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-exec-start-conf#L6-L7

@@ -0,0 +1,75 @@
//! Settings related to Amazon ECS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this doc comment is not accurate

nvidia: NvidiaDevicePluginSettings,
}

//
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this just be a blank line?

@arnaldo2792
Copy link
Copy Markdown
Contributor Author

The first line is empty to reset the ExecStart commands. Otherwise, systemd will attempt to use the two lines and fail to start nvidia-k8s-device-plugin.service because the service type is simple which only accepts one single ExecStart.

use std::convert::Infallible;

#[model(impl_default = true)]
pub struct KubeletDevicePluginV1 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to expose this as settings.kubelet-device-plugins for consistency with our other plural settings clusters. So the type name should also be plural.

Suggested change
pub struct KubeletDevicePluginV1 {
pub struct KubeletDevicePluginsV1 {

@arnaldo2792
Copy link
Copy Markdown
Contributor Author

Forced push includes:

  • Fix nits
  • Rename KubeletDevicePluginV1 to KubeletDevicePluginsV1

@@ -0,0 +1,18 @@
use bottlerocket_settings_sdk::{BottlerocketSetting, NullMigratorExtensionBuilder};
use settings_extension_kubelet_device_plugin::KubeletDevicePluginsV1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use settings_extension_kubelet_device_plugin::KubeletDevicePluginsV1;
use settings_extension_kubelet_device_plugins::KubeletDevicePluginsV1;

Use a setings extension instead of a cargo feature for the kubelet
device plugins API

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
@arnaldo2792
Copy link
Copy Markdown
Contributor Author

(forced push renames kubelet-device-plugin to kubelet-device-plugins and all the affected places)

@arnaldo2792
Copy link
Copy Markdown
Contributor Author

I confirmed in a local build that kubelet-device-plugins is working as expected:

bash-5.1# apiclient get settings.kubelet-device-plugins
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "envvar",
        "pass-device-specs": true
      }
    }
  }
}

@arnaldo2792 arnaldo2792 merged commit 4ad8f9e into bottlerocket-os:develop Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants