Skip to content

Operator fails to render a valid daemonset on OCP when using 64K kernel page size. #1207

@mvazquezc

Description

@mvazquezc

The relevant code for the version we are using (v24.6.1):

https://github.com/NVIDIA/gpu-operator/blob/release-24.6/controllers/object_controls.go#L2809

The function above will render the "sanitized" string as: 5.14.0-427.37.1.el9.4..64k-rhcos4.16 which is not a valid name.

The resulting DaemonSet name will be nvidia-driver-daemonset-5.14.0-427.37.1.el9.4..64k-rhcos4.16 and API will complain about it when the controller tries to create it:

{"level":"info","ts":"2025-01-16T09:45:22Z","logger":"controllers.ClusterPolicy","msg":"DaemonSet not found, creating","DaemonSet":"nvidia-driver-daemonset","Namespace":"nvidia-gpu-operator","Name":"nvidia-driver-daemonset-5.14.0-427.37.1.el9.4..64k-rhcos4.16"}
.
.
.
{"level":"info","ts":"2025-01-16T09:45:00Z","logger":"controllers.ClusterPolicy","msg":"Couldn't create DaemonSet","DaemonSet":"nvidia-driver-daemonset","Namespace":"nvidia-gpu-operator","Name":"nvidia-driver-daemonset-5.14.0-427.37.1.el9.4..64k-rhcos4.16","Error":"DaemonSet.apps \"nvidia-driver-daemonset-5.14.0-427.37.1.el9.4..64k-rhcos4.16\" is invalid: metadata.name: Invalid value: \"nvidia-driver-daemonset-5.14.0-427.37.1.el9.4..64k-rhcos4.16\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')"}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions