Skip to content

proposal: cmd/compile: add GOARM=8 for further optimization on armv7/aarch32 #29373

@benshi001

Description

@benshi001

Currently an ARM program built by go will call runtime.udiv() for a division, in which it detect if a hardware divider is available, or use software division.

The main reason is that a hardware divider is an optional component for an ARMv7 machine. But in the real world, most ARMv7 SOC has it, such as RaspberryPi2.

GOARM=8 implies that the program will run in the aarch32 mode (ARMv7 compatible) of an arm64 machine, on which a hardware divider is a must. So

  1. the compiler will directly generate a SDIV/UDIV for div and mod operations.
  2. the program built with GOARM=8 can also run on ARMv7 with a HW Dividor, like RP2.

The go1 benchmark does show some improvement for directly generation of SDIV/UDIV against runtime detection.

name                     old time/op    new time/op    delta
BinaryTree17-4              20.6s ± 0%     20.5s ± 1%   -0.30%  (p=0.000 n=40+40)
Fannkuch11-4                9.31s ± 0%     9.27s ± 0%   -0.42%  (p=0.000 n=40+39)
FmtFprintfEmpty-4           297ns ± 0%     298ns ± 0%   +0.34%  (p=0.000 n=38+40)
FmtFprintfString-4          588ns ± 0%     599ns ± 0%   +1.81%  (p=0.000 n=36+40)
FmtFprintfInt-4             633ns ± 0%     637ns ± 0%   +0.56%  (p=0.000 n=40+29)
FmtFprintfIntInt-4          960ns ± 0%     953ns ± 0%   -0.71%  (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4    1.05µs ± 0%    1.05µs ± 0%     ~     (p=0.194 n=35+38)
FmtFprintfFloat-4          1.95µs ± 0%    1.75µs ± 0%  -10.12%  (p=0.000 n=38+40)
FmtManyArgs-4              3.55µs ± 0%    3.42µs ± 0%   -3.68%  (p=0.000 n=40+40)
GobDecode-4                37.4ms ± 1%    37.4ms ± 1%     ~     (p=0.320 n=37+39)
GobEncode-4                34.7ms ± 1%    34.4ms ± 1%   -0.80%  (p=0.000 n=40+40)
Gzip-4                      2.06s ± 1%     2.07s ± 1%   +0.44%  (p=0.000 n=39+38)
Gunzip-4                    254ms ± 0%     254ms ± 0%   +0.16%  (p=0.000 n=40+38)
HTTPClientServer-4          823µs ± 2%     817µs ± 2%   -0.70%  (p=0.008 n=37+37)
JSONEncode-4               79.4ms ± 0%    76.0ms ± 1%   -4.23%  (p=0.000 n=32+40)
JSONDecode-4                308ms ± 0%     304ms ± 0%   -1.06%  (p=0.000 n=40+39)
Mandelbrot200-4            17.6ms ± 0%    17.6ms ± 0%     ~     (p=0.210 n=34+38)
GoParse-4                  18.9ms ± 1%    18.7ms ± 1%   -1.10%  (p=0.000 n=39+40)
RegexpMatchEasy0_32-4       500ns ± 0%     502ns ± 2%   +0.35%  (p=0.014 n=39+40)
RegexpMatchEasy0_1K-4      3.82µs ± 0%    3.82µs ± 0%   +0.15%  (p=0.000 n=40+40)
RegexpMatchEasy1_32-4       546ns ± 0%     548ns ± 1%   +0.44%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4      4.78µs ± 0%    4.79µs ± 0%   +0.16%  (p=0.000 n=38+37)
RegexpMatchMedium_32-4      737ns ± 2%     741ns ± 3%   +0.53%  (p=0.026 n=40+40)
RegexpMatchMedium_1K-4      164µs ± 0%     162µs ± 0%   -0.72%  (p=0.000 n=39+35)
RegexpMatchHard_32-4       10.6µs ± 0%    10.5µs ± 0%   -0.86%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4        316µs ± 0%     312µs ± 0%   -1.13%  (p=0.000 n=38+40)
Revcomp-4                  40.5ms ± 3%    40.8ms ± 2%   +0.85%  (p=0.001 n=40+39)
Template-4                  395ms ± 0%     387ms ± 0%   -2.07%  (p=0.000 n=40+39)
TimeParse-4                2.68µs ± 0%    2.65µs ± 0%   -1.12%  (p=0.000 n=40+40)
TimeFormat-4               5.42µs ± 0%    5.29µs ± 0%   -2.30%  (p=0.000 n=38+37)
[Geo mean]                  304µs          302µs        -0.88%

name                     old speed      new speed      delta
GobDecode-4              20.5MB/s ± 1%  20.5MB/s ± 1%     ~     (p=0.284 n=37+39)
GobEncode-4              22.1MB/s ± 1%  22.3MB/s ± 1%   +0.81%  (p=0.000 n=40+38)
Gzip-4                   9.41MB/s ± 1%  9.37MB/s ± 1%   -0.41%  (p=0.000 n=38+38)
Gunzip-4                 76.5MB/s ± 0%  76.4MB/s ± 0%   -0.16%  (p=0.000 n=40+38)
JSONEncode-4             24.4MB/s ± 0%  25.5MB/s ± 1%   +4.42%  (p=0.000 n=32+40)
JSONDecode-4             6.31MB/s ± 0%  6.37MB/s ± 0%   +1.02%  (p=0.000 n=40+35)
GoParse-4                3.06MB/s ± 1%  3.10MB/s ± 1%   +1.13%  (p=0.000 n=39+40)
RegexpMatchEasy0_32-4    63.9MB/s ± 0%  63.7MB/s ± 2%     ~     (p=0.070 n=39+40)
RegexpMatchEasy0_1K-4     268MB/s ± 0%   268MB/s ± 0%   -0.15%  (p=0.000 n=40+40)
RegexpMatchEasy1_32-4    58.6MB/s ± 0%  58.3MB/s ± 1%   -0.44%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4     214MB/s ± 0%   214MB/s ± 0%   -0.16%  (p=0.000 n=38+37)
RegexpMatchMedium_32-4   1.36MB/s ± 3%  1.35MB/s ± 3%   -0.70%  (p=0.022 n=40+40)
RegexpMatchMedium_1K-4   6.26MB/s ± 0%  6.31MB/s ± 0%   +0.73%  (p=0.000 n=37+40)
RegexpMatchHard_32-4     3.01MB/s ± 1%  3.04MB/s ± 0%   +1.03%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4     3.25MB/s ± 0%  3.28MB/s ± 0%   +1.05%  (p=0.000 n=40+40)
Revcomp-4                62.8MB/s ± 4%  62.3MB/s ± 2%   -0.86%  (p=0.001 n=40+39)
Template-4               4.91MB/s ± 0%  5.01MB/s ± 0%   +2.11%  (p=0.000 n=40+39)
[Geo mean]               17.0MB/s       17.1MB/s        +0.53%

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions