When a user builds a package, they don't need to tell Spack which platform/OS/target/CPU they want to build for, Spack automatically detects this using libraries like distro and archspec.
This is not the case for GPUs and other accelerators. For users of ML libraries, the CPU is barely used, most computation actually happens on the GPU. Currently, users need to manually figure out what GPU they are using (run a command like nvidia-smi, look up the corresponding CUDA arch from https://developer.nvidia.com/cuda-gpus, and add a section to their packages.yaml like:
packages:
all:
variants: +cuda cuda_arch=XY
This is not documented anywhere as far as I know, users need to figure this out by trial and error. If you don't set this under all: or try to set it on the command line, you may end up with a DAG with different settings for each package. For some packages, you'll see a concretization error message telling you to set cuda_arch if you want to use +cuda. For others, Spack will simply build ~cuda if you don't tell it otherwise. This is not ideal.
We should allow Spack to automatically detect things like cuda_arch (NVIDIA) and amdgpu_target (AMD) and set them accordingly for all packages. The groundwork for this will need to be done in archspec, see archspec/archspec#25. Once this is done, we'll need to change Spack to remove variants like cuda_arch and amdgpu_target and include them directly in the spec architecture. This issue is to track progress on this for the new ML SIG.
When a user builds a package, they don't need to tell Spack which platform/OS/target/CPU they want to build for, Spack automatically detects this using libraries like distro and archspec.
This is not the case for GPUs and other accelerators. For users of ML libraries, the CPU is barely used, most computation actually happens on the GPU. Currently, users need to manually figure out what GPU they are using (run a command like
nvidia-smi, look up the corresponding CUDA arch from https://developer.nvidia.com/cuda-gpus, and add a section to theirpackages.yamllike:This is not documented anywhere as far as I know, users need to figure this out by trial and error. If you don't set this under
all:or try to set it on the command line, you may end up with a DAG with different settings for each package. For some packages, you'll see a concretization error message telling you to setcuda_archif you want to use+cuda. For others, Spack will simply build~cudaif you don't tell it otherwise. This is not ideal.We should allow Spack to automatically detect things like
cuda_arch(NVIDIA) andamdgpu_target(AMD) and set them accordingly for all packages. The groundwork for this will need to be done in archspec, see archspec/archspec#25. Once this is done, we'll need to change Spack to remove variants likecuda_archandamdgpu_targetand include them directly in the spec architecture. This issue is to track progress on this for the new ML SIG.