ROCmPackage: unify amdgpu_targets#30582
ROCmPackage: unify amdgpu_targets#30582haampie merged 3 commits intospack:developfrom albestro:alby/rocm_amdgpu_targets
Conversation
|
@cgmb you also looked at this, right? |
There was a problem hiding this comment.
In principle, I agree that there should be one source of truth for the list of valid amdgpu_targets. Unfortunately, rocBLAS is probably the worst library you could do this for, as Tensile imposes unique constraints on the build architectures. (Not for any particularly good reasons. It's just technical debt.)
In making this change, I think it's important to check that you don't end up with build errors or, worse, quietly lose the tuned assembly BLAS kernels if a user accidentally specifies gfx90a rather than the officially supported gfx90a:xnack-,gfx90a:xnack+.
At the moment, the amdgpu_targets on each ROCm library are just the architectures that have been built and tested by AMD (though, not necessarily all of them in every version of the library). There are some unsupported GPU architectures that would probably work when the libraries are built from source, so there is even user value in using a single central list of amdgpu_targets rather than a shortlist of known-good architectures. However, my concern is ensuring that we manage expectations for Spack users, so they aren't surprised when those unsupported configurations don't work.
| amdgpu_targets = ( | ||
| 'gfx701', 'gfx801', 'gfx802', 'gfx803', | ||
| 'gfx900', 'gfx906', 'gfx908', 'gfx90a', 'gfx1010', | ||
| 'gfx1011', 'gfx1012' | ||
| 'gfx701', 'gfx801', 'gfx802', 'gfx803', 'gfx900', | ||
| 'gfx906', 'gfx908', 'gfx90a', | ||
| 'gfx906:xnack-', 'gfx908:xnack-', 'gfx90a:xnack-', 'gfx90a:xnack+', | ||
| 'gfx1010', 'gfx1011', 'gfx1012', 'gfx1030', | ||
| ) |
There was a problem hiding this comment.
A couple notable amdgpu_targets you're missing are:
gfx900:xnack-which is used by pretty much all the ROCm math libraries except rocBLAS and rocSOLVERgfx1031which is not officially supported, but will probably work for some libraries
There was a problem hiding this comment.
IMO, the Right Thing™ to do is to create target_id.py somewhere and define a TargetId class rather than trying to enumerate all possible permutations of processors and target features as strings. The reference documentation for such a thing would be:
Target Id syntax
Table of processors and supported target features
There was a problem hiding this comment.
A couple notable amdgpu_targets you're missing are:
* `gfx900:xnack-` which is used by pretty much all the ROCm math libraries except rocBLAS and rocSOLVER * `gfx1031` which is not officially supported, but will probably work for some libraries
Thanks for pointing that out! After your comment I grepped spack to search for currently used architectures. Here's the output:
➜ spack git:(alby/rocm_amdgpu_targets) ✗ git grep -ho "gfx[0-9]\+[^ ,;'\`\".]*" | sort | uniq
gfx000
gfx1010
gfx1011
gfx1012
gfx1030
gfx1031
gfx700
gfx701
gfx801
gfx802
gfx803
gfx900
gfx900:xnack-
gfx902
gfx906
gfx906:xnack-
gfx908
gfx908:xnack-
gfx90a
gfx90a:xnack+
gfx90a:xnack-
It caught my eye gfx000, which is mentioned just in rocm-tensile package, but I cannot find it in the official list you mentioned. I'm not able to say if it is a typo or not.
About TargetID class you are proposing, it can be added in the same file where ROCmPackage is defined.
However, I would not be against having them explicitly listed. I don't know exactly how much complicated the logic for composing architectures can be. For instance, AFAICT, not all features makes sense for all targets, so a filtering may be needed.
There was a problem hiding this comment.
gfx000 is used as a null value by rocminfo. It has been repurposed to have various meanings in different contexts.
However, I would not be against having them explicitly listed. I don't know exactly how much complicated the logic for composing architectures can be. For instance, AFAICT, not all features makes sense for all targets, so a filtering may be needed.
My reasoning to use a class would be to model that logic (and other related properties of the GPU architecture). However, it seems the total number of possible target-id values is lower than I expected, as only sramecc and xnack can be specified via target-id. I've generated all valid values of target-id for gfx7, gfx8, gfx9, gfx10 and gfx11 based on the clang AMD CGN GPUInfo table (which matches that table of processors and target features I'd linked earlier):
gfx700
gfx701
gfx702
gfx703
gfx704
gfx705
gfx801
gfx801:xnack+
gfx801:xnack-
gfx802
gfx803
gfx805
gfx810
gfx810:xnack+
gfx810:xnack-
gfx900
gfx900:xnack+
gfx900:xnack-
gfx902
gfx902:xnack+
gfx902:xnack-
gfx904
gfx904:xnack+
gfx904:xnack-
gfx906
gfx906:xnack+
gfx906:xnack-
gfx906:sramecc+
gfx906:sramecc+:xnack+
gfx906:sramecc+:xnack-
gfx906:sramecc-
gfx906:sramecc-:xnack+
gfx906:sramecc-:xnack-
gfx908
gfx908:xnack+
gfx908:xnack-
gfx908:sramecc+
gfx908:sramecc+:xnack+
gfx908:sramecc+:xnack-
gfx908:sramecc-
gfx908:sramecc-:xnack+
gfx908:sramecc-:xnack-
gfx909
gfx909:xnack+
gfx909:xnack-
gfx90a
gfx90a:xnack+
gfx90a:xnack-
gfx90a:sramecc+
gfx90a:sramecc+:xnack+
gfx90a:sramecc+:xnack-
gfx90a:sramecc-
gfx90a:sramecc-:xnack+
gfx90a:sramecc-:xnack-
gfx90c
gfx90c:xnack+
gfx90c:xnack-
gfx940
gfx940:xnack+
gfx940:xnack-
gfx940:sramecc+
gfx940:sramecc+:xnack+
gfx940:sramecc+:xnack-
gfx940:sramecc-
gfx940:sramecc-:xnack+
gfx940:sramecc-:xnack-
gfx1010
gfx1010:xnack+
gfx1010:xnack-
gfx1011
gfx1011:xnack+
gfx1011:xnack-
gfx1012
gfx1012:xnack+
gfx1012:xnack-
gfx1013
gfx1013:xnack+
gfx1013:xnack-
gfx1030
gfx1031
gfx1032
gfx1033
gfx1034
gfx1035
gfx1036
gfx1100
gfx1101
gfx1102
gfx1103
There was a problem hiding this comment.
Wow! It's like you read my mind. I had to do a double-take, because that looks so similar to what I was prototyping myself. I'll give it a more thorough review in a bit, but that looks great.
There was a problem hiding this comment.
😄 nice to hear that!
In the meanwhile I added another change...just a proposal as the rest, but it can be helpful: given a target id with a minimal set of features, returns the list of all canonical target ids satisfying it.
e.g. gfx940:xnack+ -> ['gfx940:sramecc-:xnack+', 'gfx940:sramecc+:xnack+']
It can also be extended to address the case of the empty input string as a shortcut for listing all available combination of archs/features.
Looking forward to your comments and indications on how to carry on this PR. 😉
|
Please remember to click |
cgmb
left a comment
There was a problem hiding this comment.
@albestro, I really must apologize. I figured this would be a good opportunity to 'do target id right', but the scope of work is larger than I thought. It may be best to split that out into another PR (or several).
If you want to revert to 66c09db, add gfx900:xnack- and gfx1031, then make the same change to all the ROCm packages I'd be happy to approve.
Hey @cgmb, don't worry. I grepped the repository for
In addition to these official packages I found:
These latter twos manually manage ROCm at the moment, but I would contact maintainers and in case I will open a separate PR for them. Let me know if we are ok with this 😉 |
|
Your list of official rocm libraries is good. rocthrust and rocprim should also have an |
just for the records: rocprim is in the list above, so it has been adapted in this PR. rocthrust does not have |
cgmb
left a comment
There was a problem hiding this comment.
just for the records: rocprim is in the list above, so it has been adapted in this PR. rocthrust does not have amdgpu_target variant currently.
Oh. I now see that you've already done the work! LGTM.
|
@haampie, @srekolam, @arjun-raj-kuppala, I think this looks good. |
|
@haampie, this is ready to merge. |
|
@haampie, this is ready to merge. It's a very simple change, but it's extraordinarily valuable. Once this is merged, I'll be able to tell prospective ROCm users on gfx1010 (e.g., RX 5700 XT), gfx1011 (e.g., Radeon v520 Pro) and gfx1031 (e.g., RX 6700 XT) to use Spack. Those architectures are not officially supported, and thus are not included in the binaries that AMD distributes. However, many of the ROCm libraries will work on that hardware if built from source. Spack will make it easy enough that end users can do it themselves. That's a killer feature for Spack, IMO. |
|
Thanks Harmen! |
* unify amdgpu_targets for rocsolver and rocblas * add more archs * adapt all roc packages to unified amdgpu_targets
About ROCm packages, it is a good general practice to inherit from
ROCmPackage, which providesrocmandamdgpu_targetsvariant. For the latter one it also provides a defined set of valid AMD GPU targets values.In some packages like
rocblasandrocsolver, therocmvariant does not make much sense, so they manually specify the other variant together with a set of valid AMD GPU targets.This triggers a maintenance problem. Indeed, for these packages the set of valid architectures has to be maintained manually, which is costly and error-prone.
This PR is a proposal to do it in a slightly different way, so that the set of valid targets is defined in a single place. I started with just
rocblasandrocsolver, but if you agree on this style, I can also adapt others