Features/expand microarch for aarch64#13780
Conversation
* Add process to determine aarch64 microarchitecture
* Add optimize flags for gcc on aarch64 familty.
becker33
left a comment
There was a problem hiding this comment.
This all looks good.
Can you add a test to lib/spack/spack/test/llnl/util/cpu.py? That will probably require adding a file to lib/spack/spack/test/data/targets named linux-<OS>-thunderx2 to test the target detection.
The tests in lib/spack/spack/test/llnl/util/cpu.py are mostly already parameterized by microarchitecture, so you shouldn't need to write new test code, just add parameter sets that test the new architectures.
|
@NickRF FYI |
|
Thanks for your comments. I just implemented unit tests for this pull request. And |
| }, | ||
| "thunderx2": { | ||
| "from": "aarch64", | ||
| "vendor": "0x43", |
There was a problem hiding this comment.
Apologies for arriving late at this. Are we keeping vendor names non human-readable? This will be displayed as 0x43 - aarch64 when people ask for:
$ spack arch --known-targetsThere was a problem hiding this comment.
Well... CPU implementer in /proc/cpuinfo is only displayed with specific number as vendor information. So, I implemented vendor with numbers. Actually, 0x43 means Cavium, it is vendor name. (I will check once about 0x46)
I check $ spack arch --known-targets, and confirm following.
0x43 - aarch64
thunderx2
0x46 - aarch64
a64fx
Is there way of changing vendor name displayed with this command?
There was a problem hiding this comment.
Well... CPU implementer in /proc/cpuinfo is only displayed with specific number as vendor information. So, I implemented vendor with numbers.
No worries, was wondering if we want to map the code to the name. Probably it makes sense to do it when we detect the raw information in more or less the same way that we do to map a few instructions sets from Darwin to what we currently have in the JSON file:
spack/lib/spack/llnl/util/cpu/detect.py
Lines 114 to 123 in 66cf530
There was a problem hiding this comment.
@t-karatsu Do you have a recent version of lscpu from util-linux installed on your system? I am wondering if they do this mapping...
There was a problem hiding this comment.
Oh, thanks! We will consider implementations of vendor name mapping.
@t-karatsu Do you have a recent version of lscpu from util-linux installed on your system? I am wondering if they do this mapping...
Some of the thunderX2 systems we are able to use, show vendor names(Cavium) using the lscpu command.
There was a problem hiding this comment.
Thanks for confirming @t-karatsu! I'll search into their code and see if I can find where they keep the mapping. We might be able to reuse that.
There was a problem hiding this comment.
@becker33 @t-karatsu @tgamblin This file can be interesting for us: lscpu-arm.c. The part that is most relevant to this discussion is:
static const struct hw_impl hw_implementer[] = {
{ 0x41, arm_part, "ARM" },
{ 0x42, brcm_part, "Broadcom" },
{ 0x43, cavium_part, "Cavium" },
{ 0x44, dec_part, "DEC" },
{ 0x48, hisi_part, "HiSilicon" },
{ 0x4e, nvidia_part, "Nvidia" },
{ 0x50, apm_part, "APM" },
{ 0x51, qcom_part, "Qualcomm" },
{ 0x53, samsung_part, "Samsung" },
{ 0x56, marvell_part, "Marvell" },
{ 0x66, faraday_part, "Faraday" },
{ 0x69, intel_part, "Intel" },
{ -1, unknown_part, "unknown" },
};
alalazo
left a comment
There was a problem hiding this comment.
A few comments and possibly a couple of typos spotted.
| ], | ||
| "clang": { | ||
| "versions": ":", | ||
| "flags": "-march=armv8-a -mcpu=generic" |
There was a problem hiding this comment.
Was this copied verbatim? Do we need to check the options for clang?
There was a problem hiding this comment.
The setting "-march=armv8xx" seems to be correct on aarch64 machines, but the settings for each version are still under scrutiny. Perhaps you can follow this file to see what you can set.
https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Support/AArch64TargetParser.def
There was a problem hiding this comment.
I think we can have the same values displayed by:
$ llc -march=aarch64 -mcpu=helpThere was a problem hiding this comment.
Thanks. I executed the command llc in llvm9.0, and following is shown:
[n0013@apollo13 bin]$ ./llc -march=aarch64 -mcpu=help
Available CPUs for this target:
apple-latest - Select the apple-latest processor.
cortex-a35 - Select the cortex-a35 processor.
cortex-a53 - Select the cortex-a53 processor.
cortex-a55 - Select the cortex-a55 processor.
cortex-a57 - Select the cortex-a57 processor.
cortex-a72 - Select the cortex-a72 processor.
cortex-a73 - Select the cortex-a73 processor.
cortex-a75 - Select the cortex-a75 processor.
cortex-a76 - Select the cortex-a76 processor.
cortex-a76ae - Select the cortex-a76ae processor.
cyclone - Select the cyclone processor.
exynos-m1 - Select the exynos-m1 processor.
exynos-m2 - Select the exynos-m2 processor.
exynos-m3 - Select the exynos-m3 processor.
exynos-m4 - Select the exynos-m4 processor.
exynos-m5 - Select the exynos-m5 processor.
falkor - Select the falkor processor.
generic - Select the generic processor.
kryo - Select the kryo processor.
saphira - Select the saphira processor.
thunderx - Select the thunderx processor.
thunderx2t99 - Select the thunderx2t99 processor.
thunderxt81 - Select the thunderxt81 processor.
thunderxt83 - Select the thunderxt83 processor.
thunderxt88 - Select the thunderxt88 processor.
tsv110 - Select the tsv110 processor.
Available features for this target:
many features...
Use +feature to enable a feature, or -feature to disable it.
For example, llc -mcpu=mycpu -mattr=+feature1,-feature2
[n0013@apollo13 bin]$
If the corresponding CPU can be specified, it seems good to use the -mcpu option. In this case, I will set -mcpu=thunderx2t99.
| }, | ||
| { | ||
| "versions": "7:7.9", | ||
| "flags": "-arch=armv8.2a+crc+crypt+fp16" |
There was a problem hiding this comment.
Is -arch a typo (instead of -march) here and below?
There was a problem hiding this comment.
Same question for +crypt instead of +crypto
There was a problem hiding this comment.
I'm Sorry... it's my lack of checking. I will send new pull request due to fix these typos.
There was a problem hiding this comment.
@t-karatsu No worries, these typos are easy to miss.
| }, | ||
| { | ||
| "versions": "8:", | ||
| "flags": "-arch=armv8.2a+crc+aes+sh2+fp16+sve -msve-vector-bits=512" |
There was a problem hiding this comment.
Curious why crypto was omitted here but not above.
There was a problem hiding this comment.
In gcc 8 changes (https://gcc.gnu.org/gcc-8/changes.html):
- The Armv8-A +crypto extension has now been split into two extensions for finer grained control:
- +aes which contains the Armv8-A AES crytographic instructions.
- +sha2 which contains the Armv8-A SHA2 and SHA1 cryptographic instructions.
There was a problem hiding this comment.
From what I understand reading the man pages of GCC 8 and what you linked crypto is still available, and it is equivalent to setting +aes+sha2+simd.
Pasting the snippet I read below for ease of reference:
crypto
Enable Crypto extension. This also enables Advanced SIMD and floating-point instructions.
fp Enable floating-point instructions. This is on by default for all possible values for options -march and -mcpu.
simd
Enable Advanced SIMD instructions. This also enables floating-point instructions. This is on by default for all possible values
for options -march and -mcpu.
sve Enable Scalable Vector Extension instructions. This also enables Advanced SIMD and floating-point instructions.
lse Enable Large System Extension instructions. This is on by default for -march=armv8.1-a.
rdma
Enable Round Double Multiply Accumulate instructions. This is on by default for -march=armv8.1-a.
fp16
Enable FP16 extension. This also enables floating-point instructions.
fp16fml
Enable FP16 fmla extension. This also enables FP16 extensions and floating-point instructions. This option is enabled by default
for -march=armv8.4-a. Use of this option with architectures prior to Armv8.2-A is not supported.
rcpc
Enable the RcPc extension. This does not change code generation from GCC, but is passed on to the assembler, enabling inline asm
statements to use instructions from the RcPc extension.
dotprod
Enable the Dot Product extension. This also enables Advanced SIMD instructions.
aes Enable the Armv8-a aes and pmull crypto extension. This also enables Advanced SIMD instructions.
sha2
Enable the Armv8-a sha2 crypto extension. This also enables Advanced SIMD instructions.
sha3
Enable the sha512 and sha3 crypto extension. This also enables Advanced SIMD instructions. Use of this option with architectures
prior to Armv8.2-A is not supported.
sm4 Enable the sm3 and sm4 crypto extension. This also enables Advanced SIMD instructions. Use of this option with architectures
prior to Armv8.2-A is not supported.
Feature crypto implies aes, sha2, and simd, which implies fp. Conversely, nofp implies nosimd, which implies nocrypto, noaes and
nosha2.
There was a problem hiding this comment.
| "flags": "-arch=armv8.2a+crc+aes+sh2+fp16+sve -msve-vector-bits=512" | |
| "flags": "-arch=armv8.2a+crc+aes+sha2+fp16+sve -msve-vector-bits=512" |
aarch64Add
thunderx2microarchitecture anda64fxmicroarchitecture.aarch64microarchitectureInformation field got from
/proc/cpuinfoofaarch64andx86_64are different.features information is got from
Featuresin/proc/cpuinfoofaarch64.vendor information is got from
CPU implementerin/proc/cpuinfoofaarch64.