Skip to content

nvidia GPU drivers fail to install on arm64 (g5g.xlarge on AWS) #1649

@robszumski

Description

@robszumski

Description

Unable to use the nvidia auto-install on an ARM64 machine (g5g.xlarge) on AWS. It appears to be attempting to use an x86 file.

Minor investigation shows the ebuild is tagged arm64, so I am not sure where the wires are getting crossed. Docs don't say anything about x86 exclusivity.

Impact

nvidia.service fails to install correctly

Environment and steps to reproduce

  1. Set-up: g5g.xlarge on AWS.
$ cat /usr/share/flatcar/nvidia-metadata
NVIDIA_DRIVER_VERSION=535.104.05
NVIDIA_PRODUCT_TYPE=tesla
  1. Error:
$ journalctl -u nvidia
Feb 13 14:39:35 localhost systemd[1]: Starting nvidia.service - NVIDIA Configure Service...
Feb 13 14:39:35 localhost setup-nvidia[2556]: Downloading Flatcar Container Linux Developer Container for version: 3941.1.0
Feb 13 14:39:35 ip-172-40-20-6 setup-nvidia[2748]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Feb 13 14:39:35 ip-172-40-20-6 setup-nvidia[2748]:                                  Dload  Upload   Total   Spent    Left  Speed
Feb 13 14:39:54 ip-172-40-20-6 setup-nvidia[2748]: [1.6K blob data]
Feb 13 14:40:50 ip-172-40-20-6 setup-nvidia[2556]: Downloading NVIDIA 535.104.05 Driver
Feb 13 14:40:50 ip-172-40-20-6 setup-nvidia[3034]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Feb 13 14:40:50 ip-172-40-20-6 setup-nvidia[3034]:                                  Dload  Upload   Total   Spent    Left  Speed
Feb 13 14:40:52 ip-172-40-20-6 setup-nvidia[3034]: [395B blob data]
Feb 13 14:40:52 ip-172-40-20-6 setup-nvidia[2556]: Extract the NVIDIA Driver Installer 535.104.05
Feb 13 14:40:52 ip-172-40-20-6 setup-nvidia[2556]: /opt/nvidia/workdir/nvidia-workdir /
Feb 13 14:40:52 ip-172-40-20-6 setup-nvidia[3037]: Creating directory NVIDIA-Linux-x86_64-535.104.05
Feb 13 14:40:52 ip-172-40-20-6 setup-nvidia[3037]: Verifying archive integrity... OK
Feb 13 14:40:53 ip-172-40-20-6 setup-nvidia[3037]: Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 535.104.05
Feb 13 14:40:58 ip-172-40-20-6 setup-nvidia[3067]: ...................................................................................................>
Feb 13 14:40:58 ip-172-40-20-6 setup-nvidia[2556]: /
Feb 13 14:40:58 ip-172-40-20-6 setup-nvidia[2556]: Spawn system-nspawn container to install the NVIDIA drivers
Feb 13 14:40:58 ip-172-40-20-6 sudo[3084]:     root : PWD=/ ; USER=root ; COMMAND=/usr/bin/systemd-nspawn --read-only --volatile=overlay --image=/opt/>
Feb 13 14:40:58 ip-172-40-20-6 sudo[3084]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 13 14:41:14 ip-172-40-20-6 setup-nvidia[3143]: cp: cannot stat '/opt/nvidia/workdir/nvidia-workdir/NVIDIA-Linux-x86_64-535.104.05/install-mod/*.ko>
Feb 13 14:41:14 ip-172-40-20-6 systemd[1]: nvidia.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 14:41:14 ip-172-40-20-6 systemd[1]: nvidia.service: Failed with result 'exit-code'.
Feb 13 14:41:14 ip-172-40-20-6 systemd[1]: Failed to start nvidia.service - NVIDIA Configure Service.
Feb 13 14:41:14 ip-172-40-20-6 systemd[1]: nvidia.service: Consumed 1min 12.881s CPU time.
$ nvidia-smi
-bash: /opt/bin/nvidia-smi: cannot execute binary file: Exec format error

Expected behavior

GPU drivers are installed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions