Ban use of Math.fma across the entire codebase#12014
Conversation
When FMA is not supported by the hardware, these methods fall back to BigDecimal usage which causes them to be 2500x slower. While most hardware in the last 10 years may have the support, out of box both VirtualBox and QEMU don't pass thru FMA support (for the latter at least you can tweak it with e.g. -cpu host or similar to fix this). This creates a terrible undocumented performance trap. Prevent it from sneaking into our codebase.
|
+1, seems reasonable to me. We can always remove this ban in the future if there's a good reason, but seems reasonable to put this in place to prevent it sneaking in for now. |
|
Yeah, I think if the fallback java code was 2x, 4x, or 8x slower (like you would expect from these intrinsics), we wouldn't be having this conversation :) |
|
Holy crap, creating +1 on banning its use in the code base. |
|
I honestly don't know who can use this method without any provided cpuid check... We actually use fma in our code but do so by detecting the performance difference between a naive implementation on primitive types and Math.fma (during bootstrap). It's ugly like hell but the difference is so vast that it works. I'm not sure who'd ever gain from using the bigdecimal-based implementation... |
|
I looked at what e.g. glibc does here as a fallback out of curiousity, for floats it is very simple (using Dekker algorithm), but requires changing the FP rounding mode, which you cant do in java. For doubles it is more complicated but still no bigdecimal. |
This comment was marked as duplicate.
This comment was marked as duplicate.
When FMA is not supported by the hardware, these methods fall back to BigDecimal usage which causes them to be 2500x slower. While most hardware in the last 10 years may have the support, out of box both VirtualBox and QEMU don't pass thru FMA support (for the latter at least you can tweak it with e.g. -cpu host or similar to fix this). This creates a terrible undocumented performance trap. Prevent it from sneaking into our codebase.
When FMA is not supported by the hardware, these methods fall back to
BigDecimalusage [1] which causes them to be 2500x slower [2].While most hardware in the last 10 years may have the support, out of box both VirtualBox and QEMU don't pass thru FMA support (for the latter at least you can tweak it with e.g. -cpu host or similar to fix this).
This creates a terrible undocumented performance trap, see [3] for an example of a 30x slowdown of an entire application. In my experience, developers are often too far detached from the production reality, and that reality is: we're not deploying to macbook pros in production, instead we are almost all using virtualization: we can't afford such performance traps.
Practically it would be an issue too: e.g. Policeman jenkins instance that runs our tests currently uses virtualbox. It would be bad for vector-search tests to suddenly get 30x slower.
We can't safely use this method anywhere, as we don't have access to check CPUID or anything to see if it will be insanely slow or not. Let's ban it completely: I'm concerned it will sneak into our codebase otherwise... it almost happened before: #10718
[1] Math.java source code
[2] Comment on JIRA issue for x86 intrinsic mentioning 2500x speedup
[3] VirtualBox bug for lack of FMA support