Currently an ARM program built by go will call runtime.udiv() for a division, in which it detect if a hardware divider is available, or use software division.
The main reason is that a hardware divider is an optional component for an ARMv7 machine. But in the real world, most ARMv7 SOC has it, such as RaspberryPi2.
GOARM=8 implies that the program will run in the aarch32 mode (ARMv7 compatible) of an arm64 machine, on which a hardware divider is a must. So
- the compiler will directly generate a SDIV/UDIV for div and mod operations.
- the program built with GOARM=8 can also run on ARMv7 with a HW Dividor, like RP2.
The go1 benchmark does show some improvement for directly generation of SDIV/UDIV against runtime detection.
name old time/op new time/op delta
BinaryTree17-4 20.6s ± 0% 20.5s ± 1% -0.30% (p=0.000 n=40+40)
Fannkuch11-4 9.31s ± 0% 9.27s ± 0% -0.42% (p=0.000 n=40+39)
FmtFprintfEmpty-4 297ns ± 0% 298ns ± 0% +0.34% (p=0.000 n=38+40)
FmtFprintfString-4 588ns ± 0% 599ns ± 0% +1.81% (p=0.000 n=36+40)
FmtFprintfInt-4 633ns ± 0% 637ns ± 0% +0.56% (p=0.000 n=40+29)
FmtFprintfIntInt-4 960ns ± 0% 953ns ± 0% -0.71% (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4 1.05µs ± 0% 1.05µs ± 0% ~ (p=0.194 n=35+38)
FmtFprintfFloat-4 1.95µs ± 0% 1.75µs ± 0% -10.12% (p=0.000 n=38+40)
FmtManyArgs-4 3.55µs ± 0% 3.42µs ± 0% -3.68% (p=0.000 n=40+40)
GobDecode-4 37.4ms ± 1% 37.4ms ± 1% ~ (p=0.320 n=37+39)
GobEncode-4 34.7ms ± 1% 34.4ms ± 1% -0.80% (p=0.000 n=40+40)
Gzip-4 2.06s ± 1% 2.07s ± 1% +0.44% (p=0.000 n=39+38)
Gunzip-4 254ms ± 0% 254ms ± 0% +0.16% (p=0.000 n=40+38)
HTTPClientServer-4 823µs ± 2% 817µs ± 2% -0.70% (p=0.008 n=37+37)
JSONEncode-4 79.4ms ± 0% 76.0ms ± 1% -4.23% (p=0.000 n=32+40)
JSONDecode-4 308ms ± 0% 304ms ± 0% -1.06% (p=0.000 n=40+39)
Mandelbrot200-4 17.6ms ± 0% 17.6ms ± 0% ~ (p=0.210 n=34+38)
GoParse-4 18.9ms ± 1% 18.7ms ± 1% -1.10% (p=0.000 n=39+40)
RegexpMatchEasy0_32-4 500ns ± 0% 502ns ± 2% +0.35% (p=0.014 n=39+40)
RegexpMatchEasy0_1K-4 3.82µs ± 0% 3.82µs ± 0% +0.15% (p=0.000 n=40+40)
RegexpMatchEasy1_32-4 546ns ± 0% 548ns ± 1% +0.44% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 4.78µs ± 0% 4.79µs ± 0% +0.16% (p=0.000 n=38+37)
RegexpMatchMedium_32-4 737ns ± 2% 741ns ± 3% +0.53% (p=0.026 n=40+40)
RegexpMatchMedium_1K-4 164µs ± 0% 162µs ± 0% -0.72% (p=0.000 n=39+35)
RegexpMatchHard_32-4 10.6µs ± 0% 10.5µs ± 0% -0.86% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 316µs ± 0% 312µs ± 0% -1.13% (p=0.000 n=38+40)
Revcomp-4 40.5ms ± 3% 40.8ms ± 2% +0.85% (p=0.001 n=40+39)
Template-4 395ms ± 0% 387ms ± 0% -2.07% (p=0.000 n=40+39)
TimeParse-4 2.68µs ± 0% 2.65µs ± 0% -1.12% (p=0.000 n=40+40)
TimeFormat-4 5.42µs ± 0% 5.29µs ± 0% -2.30% (p=0.000 n=38+37)
[Geo mean] 304µs 302µs -0.88%
name old speed new speed delta
GobDecode-4 20.5MB/s ± 1% 20.5MB/s ± 1% ~ (p=0.284 n=37+39)
GobEncode-4 22.1MB/s ± 1% 22.3MB/s ± 1% +0.81% (p=0.000 n=40+38)
Gzip-4 9.41MB/s ± 1% 9.37MB/s ± 1% -0.41% (p=0.000 n=38+38)
Gunzip-4 76.5MB/s ± 0% 76.4MB/s ± 0% -0.16% (p=0.000 n=40+38)
JSONEncode-4 24.4MB/s ± 0% 25.5MB/s ± 1% +4.42% (p=0.000 n=32+40)
JSONDecode-4 6.31MB/s ± 0% 6.37MB/s ± 0% +1.02% (p=0.000 n=40+35)
GoParse-4 3.06MB/s ± 1% 3.10MB/s ± 1% +1.13% (p=0.000 n=39+40)
RegexpMatchEasy0_32-4 63.9MB/s ± 0% 63.7MB/s ± 2% ~ (p=0.070 n=39+40)
RegexpMatchEasy0_1K-4 268MB/s ± 0% 268MB/s ± 0% -0.15% (p=0.000 n=40+40)
RegexpMatchEasy1_32-4 58.6MB/s ± 0% 58.3MB/s ± 1% -0.44% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 214MB/s ± 0% 214MB/s ± 0% -0.16% (p=0.000 n=38+37)
RegexpMatchMedium_32-4 1.36MB/s ± 3% 1.35MB/s ± 3% -0.70% (p=0.022 n=40+40)
RegexpMatchMedium_1K-4 6.26MB/s ± 0% 6.31MB/s ± 0% +0.73% (p=0.000 n=37+40)
RegexpMatchHard_32-4 3.01MB/s ± 1% 3.04MB/s ± 0% +1.03% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 3.25MB/s ± 0% 3.28MB/s ± 0% +1.05% (p=0.000 n=40+40)
Revcomp-4 62.8MB/s ± 4% 62.3MB/s ± 2% -0.86% (p=0.001 n=40+39)
Template-4 4.91MB/s ± 0% 5.01MB/s ± 0% +2.11% (p=0.000 n=40+39)
[Geo mean] 17.0MB/s 17.1MB/s +0.53%
Currently an ARM program built by go will call runtime.udiv() for a division, in which it detect if a hardware divider is available, or use software division.
The main reason is that a hardware divider is an optional component for an ARMv7 machine. But in the real world, most ARMv7 SOC has it, such as RaspberryPi2.
GOARM=8 implies that the program will run in the aarch32 mode (ARMv7 compatible) of an arm64 machine, on which a hardware divider is a must. So
The go1 benchmark does show some improvement for directly generation of SDIV/UDIV against runtime detection.