Abhi Venigalla (@ml_hardware) / X

Abhi Venigalla

942 posts

Abhi Venigalla

@ml_hardware

Researcher @Databricks. Former @MosaicML, @CerebrasSystems. Addicted to all things compute.

San Francisco, CA

Joined October 2018

Abhi Venigalla
@ml_hardware
Jun 30, 2023
Ready for GPU independence weekend? PyTorch 2.0 and LLM Foundry now work out of the box on ** AMD GPUs! ** We profiled MPT 1B-13B models on AMD MI250 and saw perf within 80% of A100-40GB, which could go up to 94% with better software. It. Just. Works.
Training LLMs with AMD MI250 GPUs and MosaicML | Databricks Blog
From databricks.com
228K
Abhi Venigalla
@ml_hardware
May 17, 2023
CNBC leaks PaLM2-L training config, says it is: * 340B params * 3.6T tokens * 7.3e24 FLOPs using the (6*N*D) approx
Google's newest A.I. model uses nearly five times more text data for training than its predecessor
From cnbc.com
266K
Abhi Venigalla
@ml_hardware
Jun 30, 2023
Replying to @ml_hardware
And yes, you can switch back and forth between NVIDIA and AMD, even within a single training run. It's Christmas in July!🎄
241K
Abhi Venigalla
@ml_hardware
Oct 31, 2023
Back in June we @MosaicML showed that our LLM Foundry training stack runs seamlessly on @AMD MI250 GPUs. Today, I'm happy to share that we've scaled up to 128xMI250, with great multi-node performance!
119K
Abhi Venigalla
@ml_hardware
Mar 27, 2024
We built a new model! 🧱 It's called DBRX 🧱 * mixture of experts * 16 choose 4 experts * 36B active, 132B total * trained on 12T tokens * built e2e in 2 months * using 3072xH100 * served up to 150 tok/s on @Databricks * open weights :)
47K
Abhi Venigalla
@ml_hardware
Mar 29, 2024
This is literally my new LK-99 🙏🙏🙏
Aaron Defazio
@aaron_defazio
Mar 29, 2024
Update: more experimental results rolling in. Here it is against SGD with both the step-wise and cosine schedule (both baselines heavily tuned, no cheating) This is something special indeed!
81K
Abhi Venigalla
@ml_hardware
Jan 25, 2023
We're coming for all the models! This week our Vision team profiled Stable Diffusion on @MosaicML Cloud and found that training from scratch costs <$160k, and can be done in under 2 weeks. mosaicml.com/blog/training-…
50K
Abhi Venigalla
@ml_hardware
Feb 4, 2023
Replying to @karpathy
The @MosaicML perf team just tried this out and... totally confirmed 🤯 GPT-1.3B MFU went from 49% -> 53%
127K
Abhi Venigalla
@ml_hardware
Mar 29, 2024
If you have apple silicon and > 70GB of RAM, you can run DBRX on your laptop!! Kudos to @awnihannun :)
mlx-community/dbrx-instruct-4bit · Hugging Face
From huggingface.co
20K
Abhi Venigalla
@ml_hardware
Apr 26, 2023
Our Vision team is insane. The original Stable Diffusion reportedly cost $600k... and now we've reproduced it for $50k🤯 and it took <1 week to train! All the training code is open-source! And we make it super fast + easy to customize on your own private data @MosaicML
Jonathan Frankle
@jefrankle
Apr 26, 2023
And now it's < $50k. 🖼️Announcing @MosaicML's diffusion offering 📷We replicated Stable Diffusion 2.0, training from scratch with huge speedup, and we can do it on your data too. Human eval showed the model to be indistinguishable from the original. Blog: mosaicml.com/blog/training-…
22K
Abhi Venigalla
@ml_hardware
Mar 19, 2024
Replying to @francoisfleuret
The 30x is real and comes from this technical brief, page 15: nvdam.widen.net/s/xqt56dflgh/n… How is 30x possible given GB200 has only ~2.3x increase in memBW and FLOP/s over H100? It involves comparing per-chip generation throughput = output_tokens/s/chip. The two systems compared are
nvdam.widen.net
nvidia-blackwell-architecture-technical-brief.pdf
29K
Abhi Venigalla
@ml_hardware
Sep 5, 2023
Replying to @julien_c
@julien_c Why is the training so slow? Your screenshot shows 25% MFU. Our users on MosaicML get 40%+ for the same workload on H100s. Screenshot MFU = 6 * 30e9 * 600e9 / 500 / 10 / 3600 / 24 / 1e15 = 0.25 Time to train on HF: 10 days Time to train on MosaicML: * 6.25 days *
97K
Abhi Venigalla
@ml_hardware
Jan 4, 2024
New year, new MME 🎉 @dskhudia and I profiled @Intel Gaudi2 accelerators for LLM training and inference, and found great performance and perf/$ !
LLM Training and Inference with Intel Gaudi 2 AI Accelerators | Databricks Blog
From databricks.com
45K
Abhi Venigalla
@ml_hardware
Nov 18, 2023
i love you all = ilya
Sam Altman
@sama
Nov 18, 2023
i love you all. today was a weird experience in many ways. but one unexpected one is that it has been sorta like reading your own eulogy while you’re still alive. the outpouring of love is awesome. one takeaway: go tell your friends how great you think they are.
41K