Skip to content

Automatic GPU arch detection #31581

@adamjstewart

Description

@adamjstewart

When a user builds a package, they don't need to tell Spack which platform/OS/target/CPU they want to build for, Spack automatically detects this using libraries like distro and archspec.

This is not the case for GPUs and other accelerators. For users of ML libraries, the CPU is barely used, most computation actually happens on the GPU. Currently, users need to manually figure out what GPU they are using (run a command like nvidia-smi, look up the corresponding CUDA arch from https://developer.nvidia.com/cuda-gpus, and add a section to their packages.yaml like:

packages:
  all:
    variants: +cuda cuda_arch=XY

This is not documented anywhere as far as I know, users need to figure this out by trial and error. If you don't set this under all: or try to set it on the command line, you may end up with a DAG with different settings for each package. For some packages, you'll see a concretization error message telling you to set cuda_arch if you want to use +cuda. For others, Spack will simply build ~cuda if you don't tell it otherwise. This is not ideal.

We should allow Spack to automatically detect things like cuda_arch (NVIDIA) and amdgpu_target (AMD) and set them accordingly for all packages. The groundwork for this will need to be done in archspec, see archspec/archspec#25. Once this is done, we'll need to change Spack to remove variants like cuda_arch and amdgpu_target and include them directly in the spec architecture. This issue is to track progress on this for the new ML SIG.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicA high level task that is broken down into smaller, more focused, units of work

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions