user avatar
Zach Mueller
Lambda
@TheZachMueller
Head of Dev Rel at @LambdaAPI. Hardware nerd. Usually yelling at NCCL over things. Posts are my own. twitch.tv/muellerzr
In the PCIe lanes (help)
Joined April 2016
  • Pinned
    user avatar
    You! Yes, you! Do you like keeping up with the latest and greatest in AI research? Do you also like explaining and educating others in what it means? Through articles, video, and more? We're hiring an AI Research Marketing Intern! In this role, you'll help us lay the
  • user avatar
    Suddenly 5k people learn the bitter pill of how badly NVIDIA throttles consumer cards for ML
    Made a table of the most common/supported BF16 GPUs and their non-sparse TFLOPs. What's the best way to publish this? As a wiki on my blog? A pypi package to import?
  • user avatar
    Happy to announce that starting Monday, I will be beginning my role as a Machine Learning Engineer at @huggingface! While there, I'll be helping write user-centric API's for the HuggingFace ecosystem, with the aim of empowering folks to use the libraries as best they can
  • user avatar
    New article on #python decorators is out! Specifically this shows you how decorators are written, what they do, and the power you can do with them. I even show an example of when you'd use the strange "nonlocal" 1/3 muellerzr.github.io/fastblog/pythoโ€ฆ
  • user avatar
    Today I officially begin my journey as a Machine Learning Engineer ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Itโ€™s been a crazy road these last two years, and Iโ€™m thankful to all the mentors and friends Iโ€™ve made along the way. Iโ€™m extremely excited for the road ahead ๐Ÿ˜Š
  • user avatar
    So clean. - 128TB of usable storage (plus ~16-20 being nvme. Eventually 48TB of useable) - 432GB of VRAM - 40GbE connections between the data and AI node (speed test to come soon) Workhorse Mark 2 is alive!
  • user avatar
    You may know that @huggingface Accelerate has big-model inference capabilities, but how does that work? With the help of #manim, let's dig in! Step 1: Load an empty model into memory using @PyTorch's `meta` device, so it uses a *super* tiny amount of RAM
    00:00
  • user avatar
    So much compute, so little time. This should be fun ๐Ÿคฉ
  • user avatar
    Distributed training has its own dialect. I made a pocket dictionary so you donโ€™t open 50 browser tabs every time a paper mentions โ€œZeRO-offload.โ€ 49 terms, crisp definitions, diagrams where they actually help. Grab it, skim it, get back to training. distributedlexicon(.)com
  • user avatar
    Excited to announce a new @huggingface space to help with one of machine learning's biggest questions: How much space does {X} model take in vRAM? And most importantly: when using `device_map="auto"` huggingface.co/spaces/hf-acceโ€ฆ
    00:00
  • user avatar
    Made a table of the most common/supported BF16 GPUs and their non-sparse TFLOPs. What's the best way to publish this? As a wiki on my blog? A pypi package to import?
  • user avatar
    Stop wasting time guessing why your AI fails. The most valuable skill I learned recently: error analysis maven.com/parlance-labs/โ€ฆ Hamel & Shreya teach you how to diagnose what's going wrong with your pipeline, and build evals you can trust at scale. Error analysis is just the
  • user avatar
    I spent a day optimizing a small MoE (0.5B param Qwen3 style) training loop as far as it could go, so you don't have to. 61+ hours to Chinchilla down to ~13. Come learn my tricks (oh btw it's free)
  • user avatar
    I've created a small little knowledge repository on @huggingface transformers here: github.com/muellerzr/miniโ€ฆ Essentially these contain all the `task` notebooks converted as scripts, showcasing end-to-end usage in under 150 lines of code (but still readable!)