Support AVX2 in windows better#381
Conversation
|
Thanks Mike! I had tried AVX2 and it worked, but was extremely slow. The JFK.wav sample took 17 seconds, where with AVX it takes 4.5 seconds. Using Visual Studio with AVX2 set under C/C++ Code Generation -> Enable Enhanced Instruction Set -> Advanced Vector Extensions 2 (/arch:AVX2) time went from 4.5 to about 3.6 seconds. Nice improvement. |
4faae25 to
8f4f7d9
Compare
|
@RndyP https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html For example, the case that you refer to I think actually depends on wether FMA is supported or not: So it is strange that AVX2 flags make any difference at all on Windows. My understanding is that AVX2 should not make a difference, but I might be missing something. |
|
I think what's happening is the Microsoft compiler is smart enough to convert the _mm256_add_ps(_mm256_mul_ps(b, c), a) to the fused assembly version, when the compiler option is set. The code above goes from 16 down to 12 vector instructions. 4.5 to 3.6 seconds is a nice improvement. |


I spent quite a while trying to figure out why AVX2 wasn't enabled when I built this on Windows in my build system. Adding
/arch:AVX2to the C-compiler seems to fix it. I'm not building through Visual Studio but via a more customized build system (leveraging MSBuild), so that's likely why this isn't required for other people building on Windows, but since ggml.c is a C file (not C++), it seems like this should be correct and necessary in some cicrumstances.