Zach Mueller (@TheZachMueller) / X

Zach Mueller

24.1K posts

Zach Mueller

@TheZachMueller

Head of Dev Rel at @LambdaAPI. Hardware nerd. Usually yelling at NCCL over things. Posts are my own. twitch.tv/muellerzr

In the PCIe lanes (help)

Joined April 2016

Pinned
Zach Mueller
@TheZachMueller
Jun 2
You! Yes, you! Do you like keeping up with the latest and greatest in AI research? Do you also like explaining and educating others in what it means? Through articles, video, and more? We're hiring an AI Research Marketing Intern! In this role, you'll help us lay the
2.8K
Zach Mueller
@TheZachMueller
Oct 18, 2025
Suddenly 5k people learn the bitter pill of how badly NVIDIA throttles consumer cards for ML
Zach Mueller
@TheZachMueller
Oct 17, 2025
Made a table of the most common/supported BF16 GPUs and their non-sparse TFLOPs. What's the best way to publish this? As a wiki on my blog? A pypi package to import?
224K
Zach Mueller
@TheZachMueller
Nov 12, 2021
Happy to announce that starting Monday, I will be beginning my role as a Machine Learning Engineer at @huggingface! While there, I'll be helping write user-centric API's for the HuggingFace ecosystem, with the aim of empowering folks to use the libraries as best they can
Zach Mueller
@TheZachMueller
Jul 6, 2022
New article on #python decorators is out! Specifically this shows you how decorators are written, what they do, and the power you can do with them. I even show an example of when you'd use the strange "nonlocal" 1/3 muellerzr.github.io/fastblog/pytho…
Zach Mueller
@TheZachMueller
May 17, 2021
Today I officially begin my journey as a Machine Learning Engineer 🎉🎉🎉 It’s been a crazy road these last two years, and I’m thankful to all the mentors and friends I’ve made along the way. I’m extremely excited for the road ahead 😊
Zach Mueller
@TheZachMueller
Nov 8, 2025
So clean. - 128TB of usable storage (plus ~16-20 being nvme. Eventually 48TB of useable) - 432GB of VRAM - 40GbE connections between the data and AI node (speed test to come soon) Workhorse Mark 2 is alive!
49K
Zach Mueller
@TheZachMueller
Sep 2, 2022
You may know that @huggingface Accelerate has big-model inference capabilities, but how does that work? With the help of #manim, let's dig in! Step 1: Load an empty model into memory using @PyTorch's `meta` device, so it uses a *super* tiny amount of RAM
00:00
Zach Mueller
@TheZachMueller
Apr 23, 2023
So much compute, so little time. This should be fun 🤩
132K
Zach Mueller
@TheZachMueller
Jul 24, 2025
Distributed training has its own dialect. I made a pocket dictionary so you don’t open 50 browser tabs every time a paper mentions “ZeRO-offload.” 49 terms, crisp definitions, diagrams where they actually help. Grab it, skim it, get back to training. distributedlexicon(.)com
53K
Zach Mueller
@TheZachMueller
Aug 28, 2023
Excited to announce a new @huggingface space to help with one of machine learning's biggest questions: How much space does {X} model take in vRAM? And most importantly: when using `device_map="auto"` huggingface.co/spaces/hf-acce…
00:00
145K
Zach Mueller
@TheZachMueller
Oct 17, 2025
Made a table of the most common/supported BF16 GPUs and their non-sparse TFLOPs. What's the best way to publish this? As a wiki on my blog? A pypi package to import?
294K
Zach Mueller
@TheZachMueller
Jun 18, 2025
Stop wasting time guessing why your AI fails. The most valuable skill I learned recently: error analysis maven.com/parlance-labs/… Hamel & Shreya teach you how to diagnose what's going wrong with your pipeline, and build evals you can trust at scale. Error analysis is just the
14K
Zach Mueller
@TheZachMueller
Oct 7, 2025
I spent a day optimizing a small MoE (0.5B param Qwen3 style) training loop as far as it could go, so you don't have to. 61+ hours to Chinchilla down to ~13. Come learn my tricks (oh btw it's free)
23K
Zach Mueller
@TheZachMueller
Oct 19, 2023
I've created a small little knowledge repository on @huggingface transformers here: github.com/muellerzr/mini… Essentially these contain all the `task` notebooks converted as scripts, showcasing end-to-end usage in under 150 lines of code (but still readable!)
61K