Trevor Gale (@Tgale96) / X

Trevor Gale

237 posts

Trevor Gale

@Tgale96

Research Scientist @ Google DeepMind | Former Stanford CS

Maine, USA

Joined April 2013

Pinned
Trevor Gale
@Tgale96
Mar 28, 2024
Hi all, a few updates on MegaBlocks 🧵
github.com
GitHub - databricks/megablocks
Contribute to databricks/megablocks development by creating an account on GitHub.
50K
Trevor Gale
@Tgale96
Dec 8, 2023
I woke up to an interesting PR in MegaBlocks this morning... 😅
Mistral AI
@MistralAI
Dec 8, 2023
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%https://t.co/g0m9cEUz0T%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24
Support new model by pierrestock · Pull Request #45 · databricks/megablocks
From github.com
124K
Trevor Gale
@Tgale96
Sep 30, 2020
Want to run a sparse neural network at warp speed?🔥 The code from our paper is now open-source! We’ve released our sparse models, tuned code and our dataset of sparse matrices: github.com/google-researc…
Trevor Gale
@Tgale96
Jan 27, 2023
🧵We’re excited to announce MegaBlocks, our system for efficient “dropless” MoE training! 🤖 MegaBlocks outperforms Tutel by up to 40% by reformulating MoEs as block-sparse operations, which allows us to avoid token dropping without sacrificing hardware efficiency 🚀.
40K
Trevor Gale
@Tgale96
Mar 28, 2024
Replying to @Tgale96
I’m not done with MegaBlocks 😁 @apaszke @epiqueras1 @sharadvikram and I just dropped something we’ve been working on for a bit yesterday. MegaBlocks + JAX + TPU = MegaBlox 🔥
Add MegaBlox grouped matrix multiplication kernels for TPU. by copybara-service[bot] · Pull Request...
From github.com
35K
Trevor Gale
@Tgale96
Mar 27, 2024
Look how much fun we're all having together! Come MegaBlock with us! 🥰 github.com/stanford-futur…
Julien Chaumond
@julien_c
Mar 27, 2024
Open source AI is NOT a zero-sum game and some leading contributors like @Tgale96 show it 🥰⤵️
20K
Trevor Gale
@Tgale96
Mar 27, 2024
Replying to @jefrankle and @arthurmensch
Don't fight guys we can all use MegaBlocks together 🥹
13K
Trevor Gale
@Tgale96
Dec 8, 2020
What stands between us and widespread use of sparsity in deep learning? I tried to organize some of my thoughts for this @sigarch blog post!
The Future of Sparsity in Deep Neural Networks
From sigarch.org
Trevor Gale
@Tgale96
Jun 23, 2020
Excited to share something I've been working on for a while! Fast GPU kernels for sparse linear ops with @erich_elsen, Cliff Young and @matei_zaharia! With some fancy tricks, sparse ops can be faster than dense at as low as 71% sparsity 🔥 arxiv.org/abs/2006.10901
Trevor Gale
@Tgale96
May 13, 2024
“But to us a “register” is a 16x16 tile of data.” Sounds like you guys might like TPUs 😁 Very fun post to read!
Benjamin F Spector
@bfspector
May 12, 2024
(1/7) Happy mother’s day! We think what the mothers of America really want is a Flash Attention implementation that’s just 100 lines of code and 30% faster, and we’re happy to provide. We're excited to introduce ThunderKittens (TK), a simple DSL embedded within CUDA that makes
7.5K
Trevor Gale
@Tgale96
Oct 11, 2021
Seems like pretty marginal quality gains from scaling parameter count by ~3x. 35 days on 3360 A100s, so maybe between $3M and $8M to train? Not sure this model makes sense to train, at least for these applications... developer.nvidia.com/blog/using-dee…
Trevor Gale
@Tgale96
May 19, 2021
Submit your work to the all new “Sparsity in Neural Networks” workshop! We have an excellent speaker lineup and attendance is free. Hope to see you all there 😁
Jonathan Frankle
@jefrankle
May 18, 2021
NEW WORKSHOP: Sparsity in Neural Networks: Advancing Understanding and Practice (July 8-9, 2021). This workshop will bring together members of the many communities working on neural network sparsity to share their perspectives and the latest cutting-edge research (Deadline: 6/15)
Trevor Gale
@Tgale96
Feb 4, 2023
Replying to @ml_hardware @abhi_venigalla and 2 others
The Megatron paper did tell us to do this (5.1). Probably not the only trick we should steal from Megatron-LM 😁
arxiv.org
Megatron-LM: Training Multi-Billion Parameter Language Models...
Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be...
2.6K
Trevor Gale
@Tgale96
Dec 8, 2023
Replying to @Tgale96
Oh, and also a text from Mihir with this screenshot 😂
Mihir Patel
@mvpatel2000
Dec 8, 2023
"pierrestock changed the title Mixtral-8x7B support Support new model 6 hours ago"
11K