Centralize RYU_32_BIT_PLATFORM and optimize shiftright128(). by StephanTLavavej · Pull Request #77 · ulfjack/ryu

StephanTLavavej · 2018-08-18T01:26:32Z

common.h: Centralize RYU_32_BIT_PLATFORM. This indicates whether we should avoid uint64_t operations for performance, and is activated by the MSVC platform macros _M_IX86 and _M_ARM (also defined by Clang on Windows). This centralization will make it easier to detect additional platforms as 32-bit. Note that if a 32-bit platform isn't detected here, the consequences are limited to performance, as Ryu will use correct-but-slower uint64_t operations.
d2s_intrinsics.h: Include common.h for RYU_32_BIT_PLATFORM.
Fix comment, as the shift value can be 49. (Example: r = 0xba6fa7161a4d6e0c with ieeeMantissa = 0x000fa7161a4d6e0c and ieeeExponent = 0x000003a6.) I've experimentally observed that the upper bound might be 57 instead of 58, but I can't prove that theoretically.
Optimize shiftright128() for 32-bit platforms. If we can assume dist >= 32 (which the optimizer can't see), then we can grab lo's most significant 32 bits with (uint32_t)(lo >> 32) (32-bit compilers can do this efficiently), and then perform a 32-bit shift for the remaining (dist - 32). Additionally, uint64_t | uint32_t is less work. I observe that this improves Clang 7 RC1 perf by 2.8% (80.382 ns improved to 78.193 ns) which seems nice for such a small, localized change. I also observe that this degrades MSVC perf by 0.6% (106.155 ns degraded to 106.759 ns), but (1) that's very small and (2) Clang is an existence proof that MSVC can do better (tracked by VSO#634761).
Extend #if defined(_M_IX86) to #ifdef RYU_32_BIT_PLATFORM and update comments accordingly. This extends the division-by-64-bit-constant workaround to ARM32 (for which LLVM#37932 was originally filed).
f2s.c: This already includes common.h, so we can use RYU_32_BIT_PLATFORM with no behavioral changes.

* common.h: Centralize RYU_32_BIT_PLATFORM. This indicates whether we should avoid uint64_t operations for performance, and is activated by the MSVC platform macros _M_IX86 and _M_ARM (also defined by Clang on Windows). This centralization will make it easier to detect additional platforms as 32-bit. Note that if a 32-bit platform isn't detected here, the consequences are limited to performance, as Ryu will use correct-but-slower uint64_t operations. * d2s_intrinsics.h: Include common.h for RYU_32_BIT_PLATFORM. * Fix comment, as the shift value can be 49. (Example: `r = 0xba6fa7161a4d6e0c` with `ieeeMantissa = 0x000fa7161a4d6e0c` and `ieeeExponent = 0x000003a6`.) I've experimentally observed that the upper bound might be 57 instead of 58, but I can't prove that theoretically. * Optimize shiftright128() for 32-bit platforms. If we can assume `dist >= 32` (which the optimizer can't see), then we can grab lo's most significant 32 bits with `(uint32_t)(lo >> 32)` (32-bit compilers can do this efficiently), and then perform a 32-bit shift for the remaining `(dist - 32)`. Additionally, `uint64_t | uint32_t` is less work. I observe that this improves Clang 7 RC1 perf by 2.8% (80.382 ns improved to 78.193 ns) which seems nice for such a small, localized change. I also observe that this degrades MSVC perf by 0.6% (106.155 ns degraded to 106.759 ns), but (1) that's very small and (2) Clang is an existence proof that MSVC can do better (tracked by VSO#634761). * Extend `#if defined(_M_IX86)` to `#ifdef RYU_32_BIT_PLATFORM` and update comments accordingly. This extends the division-by-64-bit-constant workaround to ARM32 (for which LLVM#37932 was originally filed). * f2s.c: This already includes common.h, so we can use RYU_32_BIT_PLATFORM with no behavioral changes.

ulfjack merged commit 8514fb3 into ulfjack:master Aug 19, 2018

StephanTLavavej deleted the shiftright128 branch August 19, 2018 22:44

StephanTLavavej mentioned this pull request Jan 10, 2024

<xcharconv_ryu.h>: Fix __ryu_shiftright128 for ARM64 microsoft/STL#4304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralize RYU_32_BIT_PLATFORM and optimize shiftright128().#77

Centralize RYU_32_BIT_PLATFORM and optimize shiftright128().#77
ulfjack merged 1 commit intoulfjack:masterfrom
StephanTLavavej:shiftright128

StephanTLavavej commented Aug 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

StephanTLavavej commented Aug 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants