Skip to content

Do not look at os-release when downloading linux kernel headers #617

@KodieGlosserIBM

Description

@KodieGlosserIBM

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

1. Quick Debug Information

  • OS/Version: RHEL8.8
  • Kernel Version: 4.18.0-477.27.1.el8_8.x86_64
  • Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): cri-o
  • K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): RedHat OpenShift on IBM Cloud
  • GPU Operator Version: stable (23.9)

2. Issue or feature description

Briefly explain the issue in terms of expected behavior and current behavior.

During the nvidia driver install, the installation looks at the os-release to reference when pulling the kernel headers. This can cause issues if the kernel major minor does not match the major minor is the os-release. Cloud providers will delay from moving to the latest kernel until proper validation and testing can be done to deem production readiness. Which will lead to this mismatch until its ready.

3. Steps to reproduce the issue

Detailed steps to reproduce the issue.

Provision a RHEL8.9 machine and downlevel the kernel to 8.8 and try to install the nvidia gpu operator.

4. Information to attach (optional if deemed irrelevant)

+ dnf makecache --releasever=8.9
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Red Hat Enterprise Linux 8 for x86_64 - BaseOS  290  B/s |  14  B     00:00    
Errors during downloading metadata for repository 'rhel-8-for-x86_64-baseos-rpms':
  - Status code: 404 for https://rhha01.updates.us-south.iaas.service.networklayer.com/pulp/repos/customer/Library/content/dist/rhel8/8.9/x86_64/baseos/os/repodata/repomd.xml (IP: 161.26.112.28)
Error: Failed to download metadata for repo 'rhel-8-for-x86_64-baseos-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
+ dnf config-manager --set-disabled rhel-8-for-x86_64-baseos-eus-rpms
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Installing Linux kernel headers...
+ echo 'Installing Linux kernel headers...'
+ dnf -q -y --releasever=8.9 install kernel-headers-4.18.0-477.27.1.el8_8.x86_64 kernel-devel-4.18.0-477.27.1.el8_8.x86_64
Error: Failed to download metadata for repo 'rhel-8-for-x86_64-baseos-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

releasever should match the kernel we are using rather than what is gathered from os-release, which is what is being done today.

These issue are related: #616 and #358 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions