Add SSE3 and fp16 conversion lookup table#368
Conversation
|
Drafting as I am unsure what value to put for |
|
A quick test seems to show that 32 leads to better performance than 16 or 64 |
Yes, that's what I do - trial and error to find the best value :) This is a great contribution. I'm very curious to see if this F16 LUT will speed-up the WASM examples, because WASM does not have an intrinsic for FP16 <-> FP32 conversion so it fallbacks to the naive conversion method. |
|
Leaving as a draft for now as I want to see if I can get rid of some of the memcpy calls in the ggml_lookup_fp16_to_fo32 function. A review would be appreciated as I am almost done with this though. |
|
Turns out the memcpy calls are optimised out by the compiler anyways :) Marking this as ready. |
|
@abitofevrything |
* Improves WASM performance: On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome * Add support for SSE3 SIMD * Add SSE3 to system information * Add Imath support for fp16-fp32 conversions * Add Imath to system information * Wrap Imath calls to avoid static function warnings * Drop Imath; Add lookup table for f16 -> f32 conversions * Remove TODO comments * Update SSE3 to new macro arguments * Correct updated macro definitions * Prefer static inline where possible * ggml : static inlines + add public f16 <-> f32 conversions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Improves WASM performance: On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome * Add support for SSE3 SIMD * Add SSE3 to system information * Add Imath support for fp16-fp32 conversions * Add Imath to system information * Wrap Imath calls to avoid static function warnings * Drop Imath; Add lookup table for f16 -> f32 conversions * Remove TODO comments * Update SSE3 to new macro arguments * Correct updated macro definitions * Prefer static inline where possible * ggml : static inlines + add public f16 <-> f32 conversions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Adds SSE3 support for SIMD and support for using Imath for fp16-fp32 conversions. Imath can be faster on systems where whisper.cpp doesn't already have a native method for doing the conversion as it uses a lookup table, leading to an ~3.5x speed increase on my system.