Conversation
|
Pattern in the patch looks sketchy, since it would gladly accept something like Here are some alternative pattern suggestions: '/(nvidia|nvidia-current)\\.ko(\\.(zst|xz|gz|bz2))?[:]' # Be really specific
'/(nvidia|nvidia-current)\\.ko(\\.[[:alnum:]]+)?[:]' # Allow any alphanumeric extension
'/(nvidia|nvidia-current)\\.ko' # Ignore everything after module nameUPD: Double backslashes in the pattern are needed due |
| Environment=NVIDIA_CTK_CDI_OUTPUT_FILE_PATH=/var/run/cdi/nvidia.yaml | ||
| EnvironmentFile=-/etc/nvidia-container-toolkit/nvidia-cdi-refresh.env | ||
| ExecCondition=/usr/bin/grep -qE '/(nvidia|nvidia-current)\.ko[:]' /lib/modules/%v/modules.dep | ||
| ExecCondition=/usr/bin/grep -qE '/(nvidia|nvidia-current)\\.ko(\\..*)?[:]' /lib/modules/%v/modules.dep |
There was a problem hiding this comment.
As @Lioli7k points out, would it be simpler to remove the [:] instead? Although this was added in #1409 it may be too restrictive.
| ExecCondition=/usr/bin/grep -qE '/(nvidia|nvidia-current)\\.ko(\\..*)?[:]' /lib/modules/%v/modules.dep | |
| ExecCondition=/usr/bin/grep -qE '/(nvidia|nvidia-current)\\.ko' /lib/modules/%v/modules.dep |
There was a problem hiding this comment.
Current patch proposal will match lines in /lib/modules/%v/modules.dep that look like this:
kernel/drivers/video/nvidia.ko:
updates/dkms/nvidia-current.ko:
extra/nvidia.ko.gz: (Compressed module)
weak-updates/nvidia.ko.xz: (Different compression)
Reversing #1409 will also work, as grep will act on the .ko even if there is more after it .ko.eg, so I am ok with both options
|
/cherry-pick release-1.18 |
Commit 1f5ce73 ("Fix trigger of CDI refresh service") did not allow for compressed modules with suffixes such as .zst or .gz. This change relaxes the matching to not require that the kernel module (including extension) be followed by a colon. This allows compressed kernel modules to also be matched. Also fix systemd warning: /etc/systemd/system/nvidia-cdi-refresh.service:26: Ignoring unknown escape sequences: "/(nvidia|nvidia-current)\.ko[:]" Signed-off-by: Ed Swarthout <Ed.Swarthout@gmail.com> Signed-off-by: Evan Lezar <elezar@nvidia.com>
elezar
left a comment
There was a problem hiding this comment.
Thanks @EdSwarthout. I have rebased off main. This looks good now.
@ArangoGutierrez this doesn't reverse #1409. It only removes the check for a trailing |
Pull Request Test Coverage Report for Build 20022696837Details
💛 - Coveralls |
|
🤖 Backport PR created for |
I have not seen a specific list of compression methods and this is not future proof.
Removing the colon check works, but it is not clear why it was added in #1409. Why does this check need to be so specific? What is wrong with running the refresh service if nvidia is anywhere in the modules.dep? What about the day when the open source driver is in the kernel and can be builtin and not a module? |
The nvidia-container-toolkit v1.18.1 has a bug where the nvidia-cdi-refresh systemd service fails to detect compressed kernel modules (.ko.xz, .ko.zst) on Debian 13 and other distros. This causes GPU container startup failures after node reboots. Additionally, CUDA repo packages install Vulkan ICD files to a different path than Debian packages, requiring symlinks for compatibility. Fixes: - Add systemd drop-in to fix ExecCondition regex for compressed modules - Add symlinks for Vulkan ICD files from CUDA repo packages Upstream fix: NVIDIA/nvidia-container-toolkit#1518 Tracking issues: - #1119: Remove systemd workaround after v1.18.2 - #1120: Review Vulkan symlinks when packages update
The nvidia-container-toolkit v1.18.1 has a bug where the nvidia-cdi-refresh systemd service fails to detect compressed kernel modules (.ko.xz, .ko.zst) on Debian 13 and other distros. This causes GPU container startup failures after node reboots. Additionally, CUDA repo packages install Vulkan ICD files to a different path than Debian packages, requiring symlinks for compatibility. Fixes: - Add systemd drop-in to fix ExecCondition regex for compressed modules - Add symlinks for Vulkan ICD files from CUDA repo packages Upstream fix: NVIDIA/nvidia-container-toolkit#1518 Tracking issues: - #1119: Remove systemd workaround after v1.18.2 - #1120: Review Vulkan symlinks when packages update
Allow any module suffix.
Commit 1f5ce73 ("Fix trigger of CDI refresh service") did not allow for compressed modules with suffixes such as .zst or .gz.
Also fix systemd warning:
/etc/systemd/system/nvidia-cdi-refresh.service:26: Ignoring unknown escape sequences: "/(nvidia|nvidia-current).ko[:]"