The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
1. Quick Debug Information
- OS/Version: RHEL8.8
- Kernel Version: 4.18.0-477.27.1.el8_8.x86_64
- Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): cri-o
- K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): RedHat OpenShift on IBM Cloud
- GPU Operator Version: stable (23.9)
2. Issue or feature description
Briefly explain the issue in terms of expected behavior and current behavior.
During the nvidia driver install, the installation looks at the os-release to reference when pulling the kernel headers. This can cause issues if the kernel major minor does not match the major minor is the os-release. Cloud providers will delay from moving to the latest kernel until proper validation and testing can be done to deem production readiness. Which will lead to this mismatch until its ready.
3. Steps to reproduce the issue
Detailed steps to reproduce the issue.
Provision a RHEL8.9 machine and downlevel the kernel to 8.8 and try to install the nvidia gpu operator.
4. Information to attach (optional if deemed irrelevant)
+ dnf makecache --releasever=8.9
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Red Hat Enterprise Linux 8 for x86_64 - BaseOS 290 B/s | 14 B 00:00
Errors during downloading metadata for repository 'rhel-8-for-x86_64-baseos-rpms':
- Status code: 404 for https://rhha01.updates.us-south.iaas.service.networklayer.com/pulp/repos/customer/Library/content/dist/rhel8/8.9/x86_64/baseos/os/repodata/repomd.xml (IP: 161.26.112.28)
Error: Failed to download metadata for repo 'rhel-8-for-x86_64-baseos-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
+ dnf config-manager --set-disabled rhel-8-for-x86_64-baseos-eus-rpms
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Installing Linux kernel headers...
+ echo 'Installing Linux kernel headers...'
+ dnf -q -y --releasever=8.9 install kernel-headers-4.18.0-477.27.1.el8_8.x86_64 kernel-devel-4.18.0-477.27.1.el8_8.x86_64
Error: Failed to download metadata for repo 'rhel-8-for-x86_64-baseos-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
releasever should match the kernel we are using rather than what is gathered from os-release, which is what is being done today.
These issue are related: #616 and #358 (comment)
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
1. Quick Debug Information
2. Issue or feature description
Briefly explain the issue in terms of expected behavior and current behavior.
During the nvidia driver install, the installation looks at the os-release to reference when pulling the kernel headers. This can cause issues if the kernel major minor does not match the major minor is the os-release. Cloud providers will delay from moving to the latest kernel until proper validation and testing can be done to deem production readiness. Which will lead to this mismatch until its ready.
3. Steps to reproduce the issue
Detailed steps to reproduce the issue.
Provision a RHEL8.9 machine and downlevel the kernel to 8.8 and try to install the nvidia gpu operator.
4. Information to attach (optional if deemed irrelevant)
releasevershould match the kernel we are using rather than what is gathered from os-release, which is what is being done today.These issue are related: #616 and #358 (comment)