Conversation
|
I am not sure that I understand what is happening. Why does the ARM feature detection in ggml report that i8mm is supported if it is not? Why does the repacked Q4_0 work with i8mm enabled? |
So far I have 3 datapoints:
I don't understand yet why M2 Ultra does not work. |
|
I wonder if OS support is also necessary. I get this on M3 Max: $ sysctl -a | grep hw.optional.arm. 12:45
hw.optional.arm.FEAT_FlagM: 1
hw.optional.arm.FEAT_FlagM2: 1
hw.optional.arm.FEAT_FHM: 1
hw.optional.arm.FEAT_DotProd: 1
hw.optional.arm.FEAT_SHA3: 1
hw.optional.arm.FEAT_RDM: 1
hw.optional.arm.FEAT_LSE: 1
hw.optional.arm.FEAT_SHA256: 1
hw.optional.arm.FEAT_SHA512: 1
hw.optional.arm.FEAT_SHA1: 1
hw.optional.arm.FEAT_AES: 1
hw.optional.arm.FEAT_PMULL: 1
hw.optional.arm.FEAT_SPECRES: 0
hw.optional.arm.FEAT_SB: 1
hw.optional.arm.FEAT_FRINTTS: 1
hw.optional.arm.FEAT_LRCPC: 1
hw.optional.arm.FEAT_LRCPC2: 1
hw.optional.arm.FEAT_FCMA: 1
hw.optional.arm.FEAT_JSCVT: 1
hw.optional.arm.FEAT_PAuth: 1
hw.optional.arm.FEAT_PAuth2: 1
hw.optional.arm.FEAT_FPAC: 1
hw.optional.arm.FEAT_DPB: 1
hw.optional.arm.FEAT_DPB2: 1
hw.optional.arm.FEAT_BF16: 1
>>hw.optional.arm.FEAT_I8MM: 1
hw.optional.arm.FEAT_WFxT: 0
hw.optional.arm.FEAT_RPRES: 1
hw.optional.arm.FEAT_ECV: 1
hw.optional.arm.FEAT_AFP: 1
hw.optional.arm.FEAT_LSE2: 1
hw.optional.arm.FEAT_CSV2: 1
hw.optional.arm.FEAT_CSV3: 1
hw.optional.arm.FEAT_DIT: 1
hw.optional.arm.FEAT_FP16: 1
hw.optional.arm.FEAT_SSBS: 1
hw.optional.arm.FEAT_BTI: 1
hw.optional.arm.FEAT_SME: 0
hw.optional.arm.FEAT_SME2: 0
hw.optional.arm.SME_F32F32: 0
hw.optional.arm.SME_BI32I32: 0
hw.optional.arm.SME_B16F32: 0
hw.optional.arm.SME_F16F32: 0
hw.optional.arm.SME_I8I32: 0
hw.optional.arm.SME_I16I32: 0
hw.optional.arm.FEAT_SME_F64F64: 0
hw.optional.arm.FEAT_SME_I16I64: 0
hw.optional.arm.FP_SyncExceptions: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.armv8_3_compnum: 1
hw.optional.armv8_crc32: 1
hw.optional.armv8_gpi: 1
hw.optional.arm64: 1 |
|
M2 Ultra also reports 28c28
< hw.optional.arm.FEAT_RPRES: 1
---
> hw.optional.arm.FEAT_RPRES: 0
30c30
< hw.optional.arm.FEAT_AFP: 1
---
> hw.optional.arm.FEAT_AFP: 0
48a49
> hw.optional.arm.caps: 868632327146696703Details |
|
Btw, it's weird because: # this passes all tests on M2 Ultra
make -j && ./bin/test-backend-ops -o MUL_MAT -b Metal
# this fails the 2 Q4_0 tests as shown earlier
make -j && ./bin/test-backend-ops -o MUL_MAT -b CPUBut both commands would run the CPU backend, correct? |
Correction, the rm -rf build-mm
mkdir build-mm
cd build-mm
cmake ..
make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU |
|
It also fails for me. The only way I can see this happening is if the second matrix multiplication on the CPU produces different results, which I don't understand how it could be happening. |
|
Almost positive that it is related to the compile flags somehow because it does not fail before 25669aa. Also, using the cd llama.cpp
make -j tests && ./tests/test-backend-ops -o MUL_MAT -b CPUAnd the only difference is that with the make, we don't pass Edit: Ah, but he |
|
In case it isn't clear, a sentinel mismatch means that there is a buffer overflow. I tried allocating a different buffer per tensor to see where it happens, and I got this: To reproduce: diff --git a/tests/test-backend-ops.cpp b/tests/test-backend-ops.cpp
index da66ed85..31fe4c33 100644
--- a/tests/test-backend-ops.cpp
+++ b/tests/test-backend-ops.cpp
@@ -473,6 +473,10 @@ struct test_case {
return false;
}
+ for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) {
+ t->data = malloc(ggml_nbytes(t));
+ }
+
// build graph
ggml_build_forward_expand(gf, out); |
|
5d7868c to
0adfd0f
Compare
|
The changes look good to me. I'll approve if requested or once more feedback is gathered. |
cont #10487
Fix MSVC I8MM feature detection + add logs.