Skip to content

ROCmPackage: unify amdgpu_targets#30582

Merged
haampie merged 3 commits intospack:developfrom
albestro:alby/rocm_amdgpu_targets
Jul 5, 2022
Merged

ROCmPackage: unify amdgpu_targets#30582
haampie merged 3 commits intospack:developfrom
albestro:alby/rocm_amdgpu_targets

Conversation

@albestro
Copy link
Copy Markdown
Contributor

About ROCm packages, it is a good general practice to inherit from ROCmPackage, which provides rocm and amdgpu_targets variant. For the latter one it also provides a defined set of valid AMD GPU targets values.

In some packages like rocblas and rocsolver, the rocm variant does not make much sense, so they manually specify the other variant together with a set of valid AMD GPU targets.

This triggers a maintenance problem. Indeed, for these packages the set of valid architectures has to be maintained manually, which is costly and error-prone.

This PR is a proposal to do it in a slightly different way, so that the set of valid targets is defined in a single place. I started with just rocblas and rocsolver, but if you agree on this style, I can also adapt others

@haampie
Copy link
Copy Markdown
Member

haampie commented May 10, 2022

@cgmb you also looked at this, right?

Copy link
Copy Markdown
Contributor

@cgmb cgmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I agree that there should be one source of truth for the list of valid amdgpu_targets. Unfortunately, rocBLAS is probably the worst library you could do this for, as Tensile imposes unique constraints on the build architectures. (Not for any particularly good reasons. It's just technical debt.)

In making this change, I think it's important to check that you don't end up with build errors or, worse, quietly lose the tuned assembly BLAS kernels if a user accidentally specifies gfx90a rather than the officially supported gfx90a:xnack-,gfx90a:xnack+.

At the moment, the amdgpu_targets on each ROCm library are just the architectures that have been built and tested by AMD (though, not necessarily all of them in every version of the library). There are some unsupported GPU architectures that would probably work when the libraries are built from source, so there is even user value in using a single central list of amdgpu_targets rather than a shortlist of known-good architectures. However, my concern is ensuring that we manage expectations for Spack users, so they aren't surprised when those unsupported configurations don't work.

Comment on lines 92 to 97
amdgpu_targets = (
'gfx701', 'gfx801', 'gfx802', 'gfx803',
'gfx900', 'gfx906', 'gfx908', 'gfx90a', 'gfx1010',
'gfx1011', 'gfx1012'
'gfx701', 'gfx801', 'gfx802', 'gfx803', 'gfx900',
'gfx906', 'gfx908', 'gfx90a',
'gfx906:xnack-', 'gfx908:xnack-', 'gfx90a:xnack-', 'gfx90a:xnack+',
'gfx1010', 'gfx1011', 'gfx1012', 'gfx1030',
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple notable amdgpu_targets you're missing are:

  • gfx900:xnack- which is used by pretty much all the ROCm math libraries except rocBLAS and rocSOLVER
  • gfx1031 which is not officially supported, but will probably work for some libraries

Copy link
Copy Markdown
Contributor

@cgmb cgmb May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the Right Thing™ to do is to create target_id.py somewhere and define a TargetId class rather than trying to enumerate all possible permutations of processors and target features as strings. The reference documentation for such a thing would be:

Target Id syntax

Table of processors and supported target features

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple notable amdgpu_targets you're missing are:

* `gfx900:xnack-` which is used by pretty much all the ROCm math libraries except rocBLAS and rocSOLVER

* `gfx1031` which is not officially supported, but will probably work for some libraries

Thanks for pointing that out! After your comment I grepped spack to search for currently used architectures. Here's the output:

➜  spack git:(alby/rocm_amdgpu_targets) ✗ git grep -ho "gfx[0-9]\+[^ ,;'\`\".]*" | sort | uniq

gfx000
gfx1010
gfx1011
gfx1012
gfx1030
gfx1031
gfx700
gfx701
gfx801
gfx802
gfx803
gfx900
gfx900:xnack-
gfx902
gfx906
gfx906:xnack-
gfx908
gfx908:xnack-
gfx90a
gfx90a:xnack+
gfx90a:xnack-

It caught my eye gfx000, which is mentioned just in rocm-tensile package, but I cannot find it in the official list you mentioned. I'm not able to say if it is a typo or not.

About TargetID class you are proposing, it can be added in the same file where ROCmPackage is defined.
However, I would not be against having them explicitly listed. I don't know exactly how much complicated the logic for composing architectures can be. For instance, AFAICT, not all features makes sense for all targets, so a filtering may be needed.

Copy link
Copy Markdown
Contributor

@cgmb cgmb May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gfx000 is used as a null value by rocminfo. It has been repurposed to have various meanings in different contexts.

However, I would not be against having them explicitly listed. I don't know exactly how much complicated the logic for composing architectures can be. For instance, AFAICT, not all features makes sense for all targets, so a filtering may be needed.

My reasoning to use a class would be to model that logic (and other related properties of the GPU architecture). However, it seems the total number of possible target-id values is lower than I expected, as only sramecc and xnack can be specified via target-id. I've generated all valid values of target-id for gfx7, gfx8, gfx9, gfx10 and gfx11 based on the clang AMD CGN GPUInfo table (which matches that table of processors and target features I'd linked earlier):

gfx700
gfx701
gfx702
gfx703
gfx704
gfx705
gfx801
gfx801:xnack+
gfx801:xnack-
gfx802
gfx803
gfx805
gfx810
gfx810:xnack+
gfx810:xnack-
gfx900
gfx900:xnack+
gfx900:xnack-
gfx902
gfx902:xnack+
gfx902:xnack-
gfx904
gfx904:xnack+
gfx904:xnack-
gfx906
gfx906:xnack+
gfx906:xnack-
gfx906:sramecc+
gfx906:sramecc+:xnack+
gfx906:sramecc+:xnack-
gfx906:sramecc-
gfx906:sramecc-:xnack+
gfx906:sramecc-:xnack-
gfx908
gfx908:xnack+
gfx908:xnack-
gfx908:sramecc+
gfx908:sramecc+:xnack+
gfx908:sramecc+:xnack-
gfx908:sramecc-
gfx908:sramecc-:xnack+
gfx908:sramecc-:xnack-
gfx909
gfx909:xnack+
gfx909:xnack-
gfx90a
gfx90a:xnack+
gfx90a:xnack-
gfx90a:sramecc+
gfx90a:sramecc+:xnack+
gfx90a:sramecc+:xnack-
gfx90a:sramecc-
gfx90a:sramecc-:xnack+
gfx90a:sramecc-:xnack-
gfx90c
gfx90c:xnack+
gfx90c:xnack-
gfx940
gfx940:xnack+
gfx940:xnack-
gfx940:sramecc+
gfx940:sramecc+:xnack+
gfx940:sramecc+:xnack-
gfx940:sramecc-
gfx940:sramecc-:xnack+
gfx940:sramecc-:xnack-
gfx1010
gfx1010:xnack+
gfx1010:xnack-
gfx1011
gfx1011:xnack+
gfx1011:xnack-
gfx1012
gfx1012:xnack+
gfx1012:xnack-
gfx1013
gfx1013:xnack+
gfx1013:xnack-
gfx1030
gfx1031
gfx1032
gfx1033
gfx1034
gfx1035
gfx1036
gfx1100
gfx1101
gfx1102
gfx1103

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgmb can 023c5aa be a reasonable starting point?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! It's like you read my mind. I had to do a double-take, because that looks so similar to what I was prototyping myself. I'll give it a more thorough review in a bit, but that looks great.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 nice to hear that!

In the meanwhile I added another change...just a proposal as the rest, but it can be helpful: given a target id with a minimal set of features, returns the list of all canonical target ids satisfying it.

e.g. gfx940:xnack+ -> ['gfx940:sramecc-:xnack+', 'gfx940:sramecc+:xnack+']

It can also be extended to address the case of the empty input string as a shortcut for listing all available combination of archs/features.

Looking forward to your comments and indications on how to carry on this PR. 😉

@tldahlgren
Copy link
Copy Markdown
Contributor

Please remember to click Ready for review when you're ready for this PR to be merged.

Copy link
Copy Markdown
Contributor

@cgmb cgmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albestro, I really must apologize. I figured this would be a good opportunity to 'do target id right', but the scope of work is larger than I thought. It may be best to split that out into another PR (or several).

If you want to revert to 66c09db, add gfx900:xnack- and gfx1031, then make the same change to all the ROCm packages I'd be happy to approve.

@albestro albestro marked this pull request as ready for review June 20, 2022 08:09
@albestro
Copy link
Copy Markdown
Contributor Author

@albestro, I really must apologize. I figured this would be a good opportunity to 'do target id right', but the scope of work is larger than I thought. It may be best to split that out into another PR (or several).

If you want to revert to 66c09db, add gfx900:xnack- and gfx1031, then make the same change to all the ROCm packages I'd be happy to approve.

Hey @cgmb, don't worry.

I grepped the repository for amdgpu_target and I found:

  • rccl
  • rocalution
  • rocblas
  • rocfft
  • rocprim
  • rocrand
  • rocsolver
  • rocsparse

In addition to these official packages I found:

  • dbcsr: already inherit ROCmPackage, so it is ok
  • sirius
  • spfft

These latter twos manually manage ROCm at the moment, but I would contact maintainers and in case I will open a separate PR for them.

Let me know if we are ok with this 😉

@cgmb
Copy link
Copy Markdown
Contributor

cgmb commented Jun 20, 2022

Your list of official rocm libraries is good. rocthrust and rocprim should also have an amdgpu_target variant, but since they don't, you could leave them for now.

@albestro
Copy link
Copy Markdown
Contributor Author

Your list of official rocm libraries is good. rocthrust and rocprim should also have an amdgpu_target variant, but since they don't, you could leave them for now.

just for the records: rocprim is in the list above, so it has been adapted in this PR. rocthrust does not have amdgpu_target variant currently.

Copy link
Copy Markdown
Contributor

@cgmb cgmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for the records: rocprim is in the list above, so it has been adapted in this PR. rocthrust does not have amdgpu_target variant currently.

Oh. I now see that you've already done the work! LGTM.

@cgmb
Copy link
Copy Markdown
Contributor

cgmb commented Jun 20, 2022

@haampie, @srekolam, @arjun-raj-kuppala, I think this looks good.

Copy link
Copy Markdown
Contributor

@srekolam srekolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgmb thanks for reviewing this. I am fine

@albestro
Copy link
Copy Markdown
Contributor Author

In addition to these official packages I found:

  • dbcsr: already inherit ROCmPackage, so it is ok
  • sirius
  • spfft

These latter twos manually manage ROCm at the moment, but I would contact maintainers and in case I will open a separate PR for them.

FIY, #31207 and #31224.

@cgmb
Copy link
Copy Markdown
Contributor

cgmb commented Jun 28, 2022

@haampie, this is ready to merge.

@cgmb
Copy link
Copy Markdown
Contributor

cgmb commented Jul 4, 2022

@haampie, this is ready to merge. It's a very simple change, but it's extraordinarily valuable.

Once this is merged, I'll be able to tell prospective ROCm users on gfx1010 (e.g., RX 5700 XT), gfx1011 (e.g., Radeon v520 Pro) and gfx1031 (e.g., RX 6700 XT) to use Spack. Those architectures are not officially supported, and thus are not included in the binaries that AMD distributes. However, many of the ROCm libraries will work on that hardware if built from source. Spack will make it easy enough that end users can do it themselves. That's a killer feature for Spack, IMO.

@haampie haampie merged commit bb3a663 into spack:develop Jul 5, 2022
@cgmb
Copy link
Copy Markdown
Contributor

cgmb commented Jul 5, 2022

Thanks Harmen!

@kwryankrattiger kwryankrattiger mentioned this pull request Jul 6, 2022
bhatiaharsh pushed a commit to bhatiaharsh/spack that referenced this pull request Aug 8, 2022
* unify amdgpu_targets for rocsolver and rocblas

* add more archs

* adapt all roc packages to unified amdgpu_targets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants