New PR in at llama.cpp for full CUDA GPU acceleration!
github.com/ggerganov/llam…
This is huge! For the first time GGML is beating GPTQ speed.
On a 4090 + i9-13900K I'm getting 109.29 tokens/s on 7B and 29.11 tokens/s on 30B.
AutoGPTQ is: 98 t/s for 7B, 35 t/s for 30B.
Tom Jobbins
336 posts
My Hugging Face repos: huggingface.co/TheBloke
Discord server: discord.gg/theblokeai
Patreon: patreon.com/TheBlokeAI
- New StarCoder coding model from @WizardLM_AI "WizardCoder-15B-v1.0 model achieves 57.3 pass@1 on the HumanEval Benchmarks .. 22.3 points higher than the SOTA open-source Code LLMs." My quants: huggingface.co/TheBloke/Wizar… huggingface.co/TheBloke/Wizar… Original: huggingface.co/WizardLM/Wizar…
- Meta's CodeLlama is here! ai.meta.com/blog/code-llam… 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python First time we've seen the 34B model I've got a couple of fp16s up: huggingface.co/TheBloke/CodeL… huggingface.co/TheBloke/CodeL… More coming soon obvs
- Llama 2 70B GGML support is here! Use this llama.cpp release: github.com/ggerganov/llam… My first repo is at: huggingface.co/TheBloke/Llama… Note: at this time it's only possible to convert the base Llama 2 models, not any fine tunes. This is being worked on.
- Oh my, LLaMA 2! 7B, 13B, 70B, 2T tokens, 4K context, commercial license! huggingface.co/meta-llama But why, Meta, why no 33B or similar size? You missed out the sweet spot? :( Unless with 2T tokens and 4K context, 13B proves more than good enough.. could be!
- I've uploaded merged/quantised versions of all of @Tim_Dettmers ' Guanaco models: huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… phew!
- Transformers 4.32.0 now supports GPTQ models natively! Over the last couple of days I have updated 296 of my GPTQ repos to provide automatic support for this. It's awesome you can now load a GPTQ model directly in Transformers with only two lines of code!LLMs just got faster and lighter with 🤗 Transformers x AutoGPTQ ! You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes 🚀 Blogpost: huggingface.co/blog/gptq-inte…
- I've just quantised my largest ever models! BLOOMZ 176B and BLOOMChat 176B! huggingface.co/TheBloke/bloom… huggingface.co/TheBloke/BLOOM… Took a month before I found a system big enough. But thanks to @latitudesh and their beast 4xH100 80GB, EPYC 9354 750GB, I did each model in <4 hours! 🚀
- Thanks again to @latitudesh for the loan of a beast 8xH100 server this week. I uploaded over 550 new repos, maybe my busiest week yet! Quanting is really resource intensive. Needs not only fast GPUs, but many CPUs, lots of disk, and 🚀 network. A server that ✅ all is v. rare!
- The other day I discovered a little environment variable buried in the @huggingface Hub Python docs: 𝙷𝙵_𝙷𝚄𝙱_𝙴𝙽𝙰𝙱𝙻𝙴_𝙷𝙵_𝚃𝚁𝙰𝙽𝚂𝙵𝙴𝚁 It has changed my life! Docs say 2x faster, but in my testing it's 3-5x faster 🚀😍 (and it's just as fast for uploads!)
- I have reached quantisation nirvana.. making 9 GPTQs at once! This @latitudesh server is a monster, and it is always hungry! 👹
GIF - New models from Allen AI Tulu 30B, 13B, 7B LLaMa models tuned on a mix of datasets eg FLAN V2, CoT, Dolly, OAST, GPT4-Alpaca, ShareGPT huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-…
- New WizardLM model, now in 13B! Trained on 250k 'evolved instructions' from ShareGPT and recorded as matching or beating GPT4 on multiple benchmarks (not all, of course :) ) I've merged and quantised here: huggingface.co/TheBloke/wizar… huggingface.co/TheBloke/wizar… huggingface.co/TheBloke/wizar…
- An interesting new special model! Gorilla enables LLMs to use tools by invoking APIs. Project website: shishirpatil.github.io/gorilla/ My uploads: huggingface.co/TheBloke/goril… huggingface.co/TheBloke/goril… huggingface.co/TheBloke/goril…




