Tired: Catching imposter syndrome by reading PhD applications from students way smarter than you.
Wired: Getting excited about talking them into building cool things with you ✨
We wrote a blogpost about our work on Task-level Mixture-of-Experts (TaskMoE), and why they're a great way to efficiently serve large models (vs more common approaches like training-> compression via distillation).
Read all about Task-level Mixture-of-Experts (TaskMoE), a promising step towards efficiently training and deploying large models, with no loss in quality and with significantly reduced inference latency ↓ goo.gle/3I5ulXj
Late tweet, but thank you ENSLP #NeurIPS2023 for the best paper award, and @Devvrit_Khatri
for the excellent presentation on behalf of the team @adityakusupati!
Excited to push further on conditional computation for tiny fast flexible models 🚀
Announcing MatFormer - a nested🪆(Matryoshka) Transformer that offers elasticity across deployment constraints.
MatFormer is an architecture that lets us use 100s of accurate smaller models that we never actually trained for!
arxiv.org/abs/2310.07707 1/9
Huge thanks to my collaborators at @GoogleAI, without whom this work would not have been possible. This work was done as a part of the Google AI Residency - applications open soon, so definitely check it out!
g.co/airesidency 8/8
Announcing MatFormer - a nested🪆(Matryoshka) Transformer that offers elasticity across deployment constraints.
MatFormer is an architecture that lets us use 100s of accurate smaller models that we never actually trained for!
arxiv.org/abs/2310.07707 1/9
I'm at #NeurIPS2023 today presenting MADLAD-400 with
@BZhangGo and @adityakusupati at 5:15pm in Hall B1/B2 #314! Come by and chat w/ us about creating *massive* datasets, making sure they're not garbage, and multilingual LMs :D