Tom Jobbins (@TheBlokeAI) / X

Tom Jobbins

336 posts

Tom Jobbins

@TheBlokeAI

My Hugging Face repos: huggingface.co/TheBloke Discord server: discord.gg/theblokeai Patreon: patreon.com/TheBlokeAI

patreon.com/TheBlokeAI

Joined July 2010

Tom Jobbins
@TheBlokeAI
Jun 12, 2023
New PR in at llama.cpp for full CUDA GPU acceleration! github.com/ggerganov/llam… This is huge! For the first time GGML is beating GPTQ speed. On a 4090 + i9-13900K I'm getting 109.29 tokens/s on 7B and 29.11 tokens/s on 30B. AutoGPTQ is: 98 t/s for 7B, 35 t/s for 30B.
CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggml-org/...
From github.com
166K
Tom Jobbins
@TheBlokeAI
Jun 14, 2023
New StarCoder coding model from @WizardLM_AI "WizardCoder-15B-v1.0 model achieves 57.3 pass@1 on the HumanEval Benchmarks .. 22.3 points higher than the SOTA open-source Code LLMs." My quants: huggingface.co/TheBloke/Wizar… huggingface.co/TheBloke/Wizar… Original: huggingface.co/WizardLM/Wizar…
TheBloke/WizardCoder-15B-1.0-GGML · Hugging Face
From huggingface.co
393K
Tom Jobbins
@TheBlokeAI
Aug 24, 2023
Meta's CodeLlama is here! ai.meta.com/blog/code-llam… 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python First time we've seen the 34B model I've got a couple of fp16s up: huggingface.co/TheBloke/CodeL… huggingface.co/TheBloke/CodeL… More coming soon obvs
ai.meta.com
Introducing Code Llama, a state-of-the-art large language model for coding
Code Llama, which is built on top of Llama 2, is free for research and commercial use.
25K
Tom Jobbins
@TheBlokeAI
Jul 23, 2023
Llama 2 70B GGML support is here! Use this llama.cpp release: github.com/ggerganov/llam… My first repo is at: huggingface.co/TheBloke/Llama… Note: at this time it's only possible to convert the base Llama 2 models, not any fine tunes. This is being worked on.
Release master-e76d630 · ggml-org/llama.cpp
From github.com
57K
Tom Jobbins
@TheBlokeAI
Jul 18, 2023
Oh my, LLaMA 2! 7B, 13B, 70B, 2T tokens, 4K context, commercial license! huggingface.co/meta-llama But why, Meta, why no 33B or similar size? You missed out the sweet spot? :( Unless with 2T tokens and 4K context, 13B proves more than good enough.. could be!
meta-llama (Meta Llama)
From huggingface.co
53K
Tom Jobbins
@TheBlokeAI
May 25, 2023
I've uploaded merged/quantised versions of all of @Tim_Dettmers ' Guanaco models: huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… phew!
TheBloke/guanaco-7B-GPTQ · Hugging Face
From huggingface.co
34K
Tom Jobbins
@TheBlokeAI
Aug 23, 2023
Transformers 4.32.0 now supports GPTQ models natively! Over the last couple of days I have updated 296 of my GPTQ repos to provide automatic support for this. It's awesome you can now load a GPTQ model directly in Transformers with only two lines of code!
Marc Sun
@_marcsun
Aug 23, 2023
LLMs just got faster and lighter with 🤗 Transformers x AutoGPTQ ! You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes 🚀 Blogpost: huggingface.co/blog/gptq-inte…
37K
Tom Jobbins
@TheBlokeAI
Jul 6, 2023
I've just quantised my largest ever models! BLOOMZ 176B and BLOOMChat 176B! huggingface.co/TheBloke/bloom… huggingface.co/TheBloke/BLOOM… Took a month before I found a system big enough. But thanks to @latitudesh and their beast 4xH100 80GB, EPYC 9354 750GB, I did each model in <4 hours! 🚀
TheBloke/bloomz-176B-GPTQ · Hugging Face
From huggingface.co
38K
Tom Jobbins
@TheBlokeAI
Sep 24, 2023
Thanks again to @latitudesh for the loan of a beast 8xH100 server this week. I uploaded over 550 new repos, maybe my busiest week yet! Quanting is really resource intensive. Needs not only fast GPUs, but many CPUs, lots of disk, and 🚀 network. A server that ✅ all is v. rare!
32K
Tom Jobbins
@TheBlokeAI
Jul 7, 2023
The other day I discovered a little environment variable buried in the @huggingface Hub Python docs: 𝙷𝙵_𝙷𝚄𝙱_𝙴𝙽𝙰𝙱𝙻𝙴_𝙷𝙵_𝚃𝚁𝙰𝙽𝚂𝙵𝙴𝚁 It has changed my life! Docs say 2x faster, but in my testing it's 3-5x faster 🚀😍 (and it's just as fast for uploads!)
82K
Tom Jobbins
@TheBlokeAI
Jul 11, 2023
I have reached quantisation nirvana.. making 9 GPTQs at once! This @latitudesh server is a monster, and it is always hungry! 👹
GIF
19K
Tom Jobbins
@TheBlokeAI
Jun 11, 2023
New models from Allen AI Tulu 30B, 13B, 7B LLaMa models tuned on a mix of datasets eg FLAN V2, CoT, Dolly, OAST, GPT4-Alpaca, ShareGPT huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-…
TheBloke/tulu-30B-GPTQ · Hugging Face
From huggingface.co
40K
Tom Jobbins
@TheBlokeAI
May 27, 2023
New WizardLM model, now in 13B! Trained on 250k 'evolved instructions' from ShareGPT and recorded as matching or beating GPT4 on multiple benchmarks (not all, of course :) ) I've merged and quantised here: huggingface.co/TheBloke/wizar… huggingface.co/TheBloke/wizar… huggingface.co/TheBloke/wizar…
TheBloke/WizardLM-13B-1.0-GGML · Hugging Face
From huggingface.co
32K
Tom Jobbins
@TheBlokeAI
May 28, 2023
An interesting new special model! Gorilla enables LLMs to use tools by invoking APIs. Project website: shishirpatil.github.io/gorilla/ My uploads: huggingface.co/TheBloke/goril… huggingface.co/TheBloke/goril… huggingface.co/TheBloke/goril…
34K