Skip to content

Features/expand microarch for aarch64#13780

Merged
becker33 merged 5 commits intospack:developfrom
t-karatsu:features/expand_microarch_for_aarch64
Nov 20, 2019
Merged

Features/expand microarch for aarch64#13780
becker33 merged 5 commits intospack:developfrom
t-karatsu:features/expand_microarch_for_aarch64

Conversation

@t-karatsu
Copy link
Copy Markdown
Contributor

  • Add microarchitecture branching from aarch64
     Add thunderx2 microarchitecture and a64fx microarchitecture.
  • Add process to determine aarch64microarchitecture
     Information field got from /proc/cpuinfo of aarch64 and x86_64 are different.
     features information is got from Features in /proc/cpuinfo of aarch64.
     vendor information is got from CPU implementer in /proc/cpuinfo of aarch64.

* Add process to determine aarch64 microarchitecture
* Add optimize flags for gcc on aarch64 familty.
Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good.

Can you add a test to lib/spack/spack/test/llnl/util/cpu.py? That will probably require adding a file to lib/spack/spack/test/data/targets named linux-<OS>-thunderx2 to test the target detection.

The tests in lib/spack/spack/test/llnl/util/cpu.py are mostly already parameterized by microarchitecture, so you shouldn't need to write new test code, just add parameter sets that test the new architectures.

@tgamblin
Copy link
Copy Markdown
Member

@NickRF FYI

@t-karatsu
Copy link
Copy Markdown
Contributor Author

Thanks for your comments. I just implemented unit tests for this pull request. And linux-centos7-thunderx2 was added as test data. Could you confirm it?

@becker33 becker33 merged commit 513fe55 into spack:develop Nov 20, 2019
},
"thunderx2": {
"from": "aarch64",
"vendor": "0x43",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for arriving late at this. Are we keeping vendor names non human-readable? This will be displayed as 0x43 - aarch64 when people ask for:

$ spack arch --known-targets

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... CPU implementer in /proc/cpuinfo is only displayed with specific number as vendor information. So, I implemented vendor with numbers. Actually, 0x43 means Cavium, it is vendor name. (I will check once about 0x46)

I check $ spack arch --known-targets, and confirm following.

0x43 - aarch64
    thunderx2

0x46 - aarch64
    a64fx

Is there way of changing vendor name displayed with this command?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... CPU implementer in /proc/cpuinfo is only displayed with specific number as vendor information. So, I implemented vendor with numbers.

No worries, was wondering if we want to map the code to the name. Probably it makes sense to do it when we detect the raw information in more or less the same way that we do to map a few instructions sets from Darwin to what we currently have in the JSON file:

if 'sse4.1' in info['flags']:
info['flags'] += ' sse4_1'
if 'sse4.2' in info['flags']:
info['flags'] += ' sse4_2'
if 'avx1.0' in info['flags']:
info['flags'] += ' avx'
if 'clfsopt' in info['flags']:
info['flags'] += ' clflushopt'
if 'xsave' in info['flags']:
info['flags'] += ' xsavec xsaveopt'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-karatsu Do you have a recent version of lscpu from util-linux installed on your system? I am wondering if they do this mapping...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks! We will consider implementations of vendor name mapping.

@t-karatsu Do you have a recent version of lscpu from util-linux installed on your system? I am wondering if they do this mapping...

Some of the thunderX2 systems we are able to use, show vendor names(Cavium) using the lscpu command.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming @t-karatsu! I'll search into their code and see if I can find where they keep the mapping. We might be able to reuse that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becker33 @t-karatsu @tgamblin This file can be interesting for us: lscpu-arm.c. The part that is most relevant to this discussion is:

static const struct hw_impl hw_implementer[] = {
    { 0x41, arm_part,     "ARM" },
    { 0x42, brcm_part,    "Broadcom" },
    { 0x43, cavium_part,  "Cavium" },
    { 0x44, dec_part,     "DEC" },
    { 0x48, hisi_part,  "HiSilicon" },
    { 0x4e, nvidia_part,  "Nvidia" },
    { 0x50, apm_part,     "APM" },
    { 0x51, qcom_part,    "Qualcomm" },
    { 0x53, samsung_part, "Samsung" },
    { 0x56, marvell_part, "Marvell" },
    { 0x66, faraday_part, "Faraday" },
    { 0x69, intel_part,   "Intel" },
    { -1,   unknown_part, "unknown" },
};

Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and possibly a couple of typos spotted.

],
"clang": {
"versions": ":",
"flags": "-march=armv8-a -mcpu=generic"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this copied verbatim? Do we need to check the options for clang?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting "-march=armv8xx" seems to be correct on aarch64 machines, but the settings for each version are still under scrutiny. Perhaps you can follow this file to see what you can set.
https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Support/AArch64TargetParser.def

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have the same values displayed by:

$ llc -march=aarch64 -mcpu=help

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I executed the command llc in llvm9.0, and following is shown:

[n0013@apollo13 bin]$ ./llc -march=aarch64 -mcpu=help
Available CPUs for this target:

  apple-latest - Select the apple-latest processor.
  cortex-a35   - Select the cortex-a35 processor.
  cortex-a53   - Select the cortex-a53 processor.
  cortex-a55   - Select the cortex-a55 processor.
  cortex-a57   - Select the cortex-a57 processor.
  cortex-a72   - Select the cortex-a72 processor.
  cortex-a73   - Select the cortex-a73 processor.
  cortex-a75   - Select the cortex-a75 processor.
  cortex-a76   - Select the cortex-a76 processor.
  cortex-a76ae - Select the cortex-a76ae processor.
  cyclone      - Select the cyclone processor.
  exynos-m1    - Select the exynos-m1 processor.
  exynos-m2    - Select the exynos-m2 processor.
  exynos-m3    - Select the exynos-m3 processor.
  exynos-m4    - Select the exynos-m4 processor.
  exynos-m5    - Select the exynos-m5 processor.
  falkor       - Select the falkor processor.
  generic      - Select the generic processor.
  kryo         - Select the kryo processor.
  saphira      - Select the saphira processor.
  thunderx     - Select the thunderx processor.
  thunderx2t99 - Select the thunderx2t99 processor.
  thunderxt81  - Select the thunderxt81 processor.
  thunderxt83  - Select the thunderxt83 processor.
  thunderxt88  - Select the thunderxt88 processor.
  tsv110       - Select the tsv110 processor.

Available features for this target:
  many features...

Use +feature to enable a feature, or -feature to disable it.
For example, llc -mcpu=mycpu -mattr=+feature1,-feature2
[n0013@apollo13 bin]$

If the corresponding CPU can be specified, it seems good to use the -mcpu option. In this case, I will set -mcpu=thunderx2t99.

},
{
"versions": "7:7.9",
"flags": "-arch=armv8.2a+crc+crypt+fp16"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is -arch a typo (instead of -march) here and below?

Copy link
Copy Markdown
Member

@alalazo alalazo Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for +crypt instead of +crypto

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Sorry... it's my lack of checking. I will send new pull request due to fix these typos.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-karatsu No worries, these typos are easy to miss.

},
{
"versions": "8:",
"flags": "-arch=armv8.2a+crc+aes+sh2+fp16+sve -msve-vector-bits=512"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why crypto was omitted here but not above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In gcc 8 changes (https://gcc.gnu.org/gcc-8/changes.html):

  • The Armv8-A +crypto extension has now been split into two extensions for finer grained control:
    • +aes which contains the Armv8-A AES crytographic instructions.
    • +sha2 which contains the Armv8-A SHA2 and SHA1 cryptographic instructions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand reading the man pages of GCC 8 and what you linked crypto is still available, and it is equivalent to setting +aes+sha2+simd.

Pasting the snippet I read below for ease of reference:

       crypto
           Enable Crypto extension.  This also enables Advanced SIMD and floating-point instructions.

       fp  Enable floating-point instructions.  This is on by default for all possible values for options -march and -mcpu.

       simd
           Enable Advanced SIMD instructions.  This also enables floating-point instructions.  This is on by default for all possible values
           for options -march and -mcpu.

       sve Enable Scalable Vector Extension instructions.  This also enables Advanced SIMD and floating-point instructions.

       lse Enable Large System Extension instructions.  This is on by default for -march=armv8.1-a.

       rdma
           Enable Round Double Multiply Accumulate instructions.  This is on by default for -march=armv8.1-a.

       fp16
           Enable FP16 extension.  This also enables floating-point instructions.

       fp16fml
           Enable FP16 fmla extension.  This also enables FP16 extensions and floating-point instructions. This option is enabled by default
           for -march=armv8.4-a. Use of this option with architectures prior to Armv8.2-A is not supported.

       rcpc
           Enable the RcPc extension.  This does not change code generation from GCC, but is passed on to the assembler, enabling inline asm
           statements to use instructions from the RcPc extension.

       dotprod
           Enable the Dot Product extension.  This also enables Advanced SIMD instructions.

       aes Enable the Armv8-a aes and pmull crypto extension.  This also enables Advanced SIMD instructions.

       sha2
           Enable the Armv8-a sha2 crypto extension.  This also enables Advanced SIMD instructions.

       sha3
           Enable the sha512 and sha3 crypto extension.  This also enables Advanced SIMD instructions. Use of this option with architectures
           prior to Armv8.2-A is not supported.

sm4 Enable the sm3 and sm4 crypto extension.  This also enables Advanced SIMD instructions.  Use of this option with architectures
           prior to Armv8.2-A is not supported.

       Feature crypto implies aes, sha2, and simd, which implies fp.  Conversely, nofp implies nosimd, which implies nocrypto, noaes and
       nosha2.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"flags": "-arch=armv8.2a+crc+aes+sh2+fp16+sve -msve-vector-bits=512"
"flags": "-arch=armv8.2a+crc+aes+sha2+fp16+sve -msve-vector-bits=512"

@t-karatsu t-karatsu deleted the features/expand_microarch_for_aarch64 branch November 21, 2019 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants