ggml-cpu: cmake add arm64 cpu feature check for macos#10487
ggml-cpu: cmake add arm64 cpu feature check for macos#10487ggerganov merged 2 commits intoggml-org:masterfrom
Conversation
ggml/src/ggml-cpu/CMakeLists.txt
Outdated
| add_compile_definitions(__ARM_FEATURE_DOTPROD) | ||
| endif () | ||
|
|
||
| check_cxx_source_compiles("#include <arm_neon.h>\nint main() { int8x16_t _a, _b; int32x4_t _s = vmlaq_f32(_s, _a, _b); return 0; }" GGML_COMPILER_SUPPORT_MATMUL_INT8) |
There was a problem hiding this comment.
How come the presence of vmlaq_f32 confirms __ARM_FEATURE_MATMUL_INT8?
There was a problem hiding this comment.
Thanks for your question. The current check using vmlaq_f32 to confirm __ARM_FEATURE_MATMUL_INT8 is incorrect because vmlaq_f32 operates on floating-point values (float32x4_t), while __ARM_FEATURE_MATMUL_INT8 is specifically related to integer matrix multiplication using INT8 (8-bit signed integers).
I'll update the patch using an operation that deals with INT8 data types, such as vmlaq_s32.
|
This change breaks make -j && ./bin/test-backend-ops -o MUL_MAT -b CPUOn M1 Pro it crashes like this: On M2 Ultra, the tests fail: |
|
The problem is even though the sources compile successfully, they do not run: g++ -march=armv8.2a+i8mm test.cpp && ./a.out
Illegal instruction: 4 |
|
I was able to reproduce the failing cases on an M3. The issue appears to be related to the test test-backend-ops, which is a special case that doesn't require model loading. As a result, the Q4_0 optimized kernel was not activated despite the hardware supporting i8mm. Meanwhile the macro GGML_USE_LLAMAFILE was undefined as result of __ARM_FEATURE_MATMUL_INT8 definition. This mismatch caused the kernel selection logic to skip both the optimized path for Q4_0 and GGML_USE_LLAMAFILE path. I applied a temporary patch to prevent GGML_USE_LLAMAFILE from being undefined and |
|
The test-backend-ops didn't seem use I8MM optimized kernel in my test on a M3 even though __ARM_FEATURE_MATMUL_INT8 was enabled. |
|
|
* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check
Add fixe for #10435