Conversation
|
For AVX2/AVX/scalar, we might want to keep I'm actually surprised that they're worth using on ARM NEON, as the alternative is simply subtracting 8 from the Q4 quants. |
|
@sw there is no noticeable difference difference between the two. Still, changed to use |
|
I guess it's not finished? You're using |
|
Wow - this is difficult 😄 I keep messing up something |
|
Looks good now; I think it's very slightly slower for Q4_0 and Q4_2 because we're now missing the SIMD optimizations for |
|
Ok, will merge now and we can finish the AVX stuff from |
8-bit integer quantization support
Perplexity:
5.9563Details