Meet DBRX, a new sota open llm from @databricks. It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.
Jonathan Frankle
3,982 posts
Joined December 2013
- I just open-sourced my codebase for research on neural network pruning, the Lottery Ticket Hypothesis, and other topics in deep learning. It's written in PyTorch and designed to make it easy to add new models, datasets, and experiments. Check it out:
- The hardest part about finetuning LLMs is that people generally don't have high-quality labeled data. Today, @databricks introduced TAO, a new finetuning method that only needs inputs, no labels necessary. Best of all, it actually beats supervised finetuning on labeled data.
- MPT is here! Check out our shiny new LLMs, open-source w/commercial license. The base MPT-7B model is 7B params trained on 1T tokens and reaches LLaMA-7B quality. We also created Instruct (commercial), Chat, and (my favorite) StoryWriter-65k+ variants. 🧵
- MPT-30B is here! Same MPT architecture, 30B parameters, > 1T tokens, 8k context window, trained on H100s, great perf (esp on coding), single-GPU inference, commercially usable, and massively upgraded instruct and chat datasets. Take it for a spin! huggingface.co/spaces/mosaicm…
- I defended today, and @mcarbin was kind enough to pass me. My favorite part of the thesis is a ground-up rewrite of the original Lottery Ticket Hypothesis paper with fresh data and a narrative that benefits from four years of hindsight/maturity. Coming soon to an arxiv near you!
- 72 hrs ago, @togethercompute released the RedPajama dataset. Like everyone, we at @MosaicML were very excited about the idea of a fully open-source Llama. So excited, in fact, that we've already trained a 1B model on 200B tokens! It's on HF (Apache2) here:
- I'm absolutely thrilled that @MosaicML has agreed to join @databricks as we continue on our journey to make the latest advances deep learning efficient and accessible for everyone. The best of MosaicML is yet to come 🎉🎉🎉Big news: we've agreed to acquire @MosaicML, a leading generative AI platform. I couldn’t be more excited to join forces once the deal closes. databricks.com/mosaic-news
- Five years ago, @NaveenGRao cold emailed me about starting a company. I knew nothing about startups, VC, products, or customers. My first PhD was in AI with @mcarbin. My second PhD was in startups with Naveen. I couldn't have asked for a better adviser on that journey.Today is my last day at @databricks . ~2.5 years ago @alighodsi told me his goal was to build a $100B company. Databricks was at a $38B valuation when MosaicML was acquired in July 2023 and just broke the $100B valuation number. It’s amazing to be part of this growth! And now AI
- For those interested, my dissertation is now available. The highlight is that I re-did the original Lottery Ticket Hypothesis paper from scratch (Chapter 3). It follows the same path as the original, but with years of context/maturity + a new experiment 🧵 jfrankle.com/jfrankle-disse…
- 1/21 Banner year for Harvard CS! New hires include Sham Kakade @ShamKakade6 and Fernanda Viegas @viegasf (joining @wattenberg), as well as David Alvarez-Melis, Anurag Anshu @AnuragAnshu4, Sitan Chen, and Jonathan Frankle @jefrankle seas.harvard.edu/news/2021/10/s…
- TLDR: Announcing 🌟COMPOSER🌟, a PyTorch trainer for efficient training *algorithmically*. Train 2x-4x faster on standard ML tasks, a taste of what's coming from @MosaicML. Star it, 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚖𝚘𝚜𝚊𝚒𝚌𝚖𝚕, contribute, be efficient! github.com/mosaicml/compo… Thread:











