Add whisper.cpp (server) support to llamafile#517
Conversation
|
OK the good news is that whisperfile works outstandingly well with the "large" model on cpu. However I'm going to disable GPU support by default because it isn't working reliably. Here's CUDA. Here's Apple Metal: I'm not sure what's wrong. It's possible the problem will solve itself the next time I synchronize with llama.cpp upstream. I also checked and it definitely wasn't my performance optimization in the last change. The tiny |
|
Thank you @jart!! I must've missed porting some of the CUDA/Metal from whisperfile repo, appreciate you taking such quick care of it. Love the performance improvements on CPU as well! It's possible could be something with the encoding, I've ran into some issues with the bin's before.. Also when I synced the whisper.cpp code I did not sanity check it (and I don't know if they are), so it may be an underlying issue? If you don't get to it first I'll take a closer look towards the end of the week/over the weekend |



This PR adds whisper.cpp support to llamafile. This addresses #17 in part. Only the server binary has been ported in this PR.
Most of the work to support this was initially done on my fork of llamafile: whisperfile. This PR ports the code over, and structures it similarly to stable-diffusion.cpp support
The whisper.cpp code was taken from commit: 6739eb8