Created attachment 20488 [details] Test case This affects the Ryu algorithm for printing floating-point numbers (https://github.com/ulfjack/ryu ) and therefore affects C++17 floating-point std::to_chars(). This is possibly the same bug as https://bugs.llvm.org/show_bug.cgi?id=23106 "Division followed by modulo generates longer machine code than vice versa". I observe that MSVC's codegen is unaffected by USE_MODULO, while Clang/LLVM generates more assembly code (which is slower when profiled in the real algorithm) for USE_MODULO. C:\Temp\TESTING_X64>cl Microsoft (R) C/C++ Optimizing Compiler Version 19.15.26504 for x64 Copyright (C) Microsoft Corporation. All rights reserved. usage: cl [ option... ] filename... [ /link linkoption... ] C:\Temp\TESTING_X64>clang-cl -m64 -v clang version 6.0.0 (tags/RELEASE_600/final) Target: x86_64-pc-windows-msvc Thread model: posix InstalledDir: S:\msvc\src\vctools\NonShip\ClangLLVM\bin C:\Temp\TESTING_X64>type meow.cpp unsigned long long ryu(unsigned long long vp, unsigned long long vm) { bool vmIsTrailingZeros = true; while (vp / 10 > vm / 10) { #ifdef USE_MODULO vmIsTrailingZeros &= vm % 10 == 0; #else // The compiler does not realize that vm % 10 can be computed from vm / 10 // as vm - (vm / 10) * 10. vmIsTrailingZeros &= vm - (vm / 10) * 10 == 0; // vm % 10 == 0; #endif vp /= 10; vm /= 10; } return vmIsTrailingZeros ? vp : vm; } C:\Temp\TESTING_X64>cl /EHsc /nologo /W4 /MT /O2 /c /FAsc /Famsvc_workaround.cod meow.cpp meow.cpp C:\Temp\TESTING_X64>cl /EHsc /nologo /W4 /MT /O2 /c /FAsc /Famsvc_modulo.cod /DUSE_MODULO meow.cpp meow.cpp C:\Temp\TESTING_X64>clang-cl -m64 /EHsc /nologo /W4 /MT /O2 /c /FA /Faclang_workaround.asm meow.cpp C:\Temp\TESTING_X64>clang-cl -m64 /EHsc /nologo /W4 /MT /O2 /c /FA /Faclang_modulo.asm /DUSE_MODULO meow.cpp C:\Temp\TESTING_X64>
Created attachment 20489 [details] Clang codegen for workaround
Created attachment 20490 [details] Clang codegen for modulo (this is the bug)
Created attachment 20491 [details] MSVC codegen for workaround
Created attachment 20492 [details] MSVC codegen for modulo (this is fine)
Here's a Godbolt link demonstrating the codegen difference (this isn't Windows-specific): https://godbolt.org/g/RsUT4a
Relevant: https://bugs.llvm.org/show_bug.cgi?id=35479
https://reviews.llvm.org/rL364600 seems to fix this. Can you confirm?
Looks like that landed in https://reviews.llvm.org/rL367374 . Is this done now?
Sorry, wrong bug, please ignore comment 8 (but not comment 7)
Looks like this got into the 9.x release branch; I'll try to verify this soon. Thanks!