|
|
| Bugzilla Link |
37983 |
| Version |
trunk |
| OS |
All |
| Attachments |
Test case, Clang codegen for workaround, Clang codegen for modulo (this is the bug), MSVC codegen for workaround, MSVC codegen for modulo (this is fine) |
| CC |
@topperc,@efriedma-quic,@LebedevRI,@RKSimon,@nico,@rotateright,@Trass3r |
Extended Description
This affects the Ryu algorithm for printing floating-point numbers (https://github.com/ulfjack/ryu ) and therefore affects C++17 floating-point std::to_chars(). This is possibly the same bug as #23480 "Division followed by modulo generates longer machine code than vice versa".
I observe that MSVC's codegen is unaffected by USE_MODULO, while Clang/LLVM generates more assembly code (which is slower when profiled in the real algorithm) for USE_MODULO.
C:\Temp\TESTING_X64>cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.15.26504 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
usage: cl [ option... ] filename... [ /link linkoption... ]
C:\Temp\TESTING_X64>clang-cl -m64 -v
clang version 6.0.0 (tags/RELEASE_600/final)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: S:\msvc\src\vctools\NonShip\ClangLLVM\bin
C:\Temp\TESTING_X64>type meow.cpp
unsigned long long ryu(unsigned long long vp, unsigned long long vm) {
bool vmIsTrailingZeros = true;
while (vp / 10 > vm / 10) {
#ifdef USE_MODULO
vmIsTrailingZeros &= vm % 10 == 0;
#else
// The compiler does not realize that vm % 10 can be computed from vm / 10
// as vm - (vm / 10) * 10.
vmIsTrailingZeros &= vm - (vm / 10) * 10 == 0; // vm % 10 == 0;
#endif
vp /= 10;
vm /= 10;
}
return vmIsTrailingZeros ? vp : vm;
}
C:\Temp\TESTING_X64>cl /EHsc /nologo /W4 /MT /O2 /c /FAsc /Famsvc_workaround.cod meow.cpp
meow.cpp
C:\Temp\TESTING_X64>cl /EHsc /nologo /W4 /MT /O2 /c /FAsc /Famsvc_modulo.cod /DUSE_MODULO meow.cpp
meow.cpp
C:\Temp\TESTING_X64>clang-cl -m64 /EHsc /nologo /W4 /MT /O2 /c /FA /Faclang_workaround.asm meow.cpp
C:\Temp\TESTING_X64>clang-cl -m64 /EHsc /nologo /W4 /MT /O2 /c /FA /Faclang_modulo.asm /DUSE_MODULO meow.cpp
C:\Temp\TESTING_X64>
Extended Description
This affects the Ryu algorithm for printing floating-point numbers (https://github.com/ulfjack/ryu ) and therefore affects C++17 floating-point std::to_chars(). This is possibly the same bug as #23480 "Division followed by modulo generates longer machine code than vice versa".
I observe that MSVC's codegen is unaffected by USE_MODULO, while Clang/LLVM generates more assembly code (which is slower when profiled in the real algorithm) for USE_MODULO.