user avatar
Michael Carbin
@mcarbin
Associate Professor in EECS at @MIT | Co-Founder at @unconvai | Founding Advisor at @mosaicml | Programming Systems | Neural Networks | Unconventiona Computing
Cambridge, MA
Joined September 2007
Posts
  • Pinned
    user avatar
    If you're interested in emerging modeling approaches (like these and others), then reach out!
    At Unconventional, we’re building the computational substrate for the AI era. Scientists and SWEs interested in dynamical systems (Diffusion, Neural ODEs, Deep Equilbrium Models, and Energy-based models) DM or email [email protected] (subject: dynamics). Things are getting really
  • user avatar
    “How’s your sabbatical?” Well…DBRX is GREAT at RAG! If you’ve been using Mixtral/Llama2/GPT3.5, then try DBRX! The combination of RAG with its SoTA capabilities on knowledge/code/reasoning will unlock new CompoundAI opportunities. databricks.com/blog/introduci…
  • user avatar
    Boom! We at @MosaicML plan to unite with an amazing group of colleagues at @Databricks! And don’t worry, still the same great @MosaicML taste: our brand, products, and mission remain. But, going bigger, much bigger. So watch out for more from a truly amazing team! Bravo team!
    Big news: we've agreed to acquire @MosaicML, a leading generative AI platform. I couldn’t be more excited to join forces once the deal closes. databricks.com/mosaic-news
  • user avatar
    That is indeed me. :-) #SloanFellow. Thank you, mentors. But, importantly, the award reflects my work with fantastic students: @jfrankle, @alex_renda_, Eric Atkinson, Ben Sherman, Cambridge Yang, @charith_mendis, @TomChen17, @JesseMMichel, James Gilles sloan.org/fellowships/20…
  • user avatar
    Proud of the @MosaicML team’s continued push for open science and efficient ML with another Composer release: github.com/mosaicml/compo…... 🧵
  • user avatar
    Finally! I'm incredibly proud of the amazing team we have at @mosaicml. Together, we are out to reduce the costs of ML training with openly released tools and methodologies. I also desire our work to make strong, reproducible baselines more accessible to the research community.
    Hello World! Today we come out of stealth to make ML training more efficient with a mosaic of methods that modify training to improve speed, reduce cost, and boost quality. Read our founders' blog by @NaveenGRao @hanlintang @mcarbin @jefrankle mosaicml.com/blog/founders-… (1/4)
  • user avatar
    We and the community are still on the hunt for why lottery tickets exist. Our (@jefrankle, @KDziugaite, @roydanroy) work here develops a powerful microscope to assess neural net training behavior, particularly, when lottery tickets emerge. One more step towards an understanding.
    At ICML next week, @KDziugaite @roydanroy @mcarbin and I will present Linear Mode Connectivity and the Lottery Ticket Hypothesis. We study the effect of SGD noise (like data order) on neural net optimization. Those results shed new light on lottery tickets arxiv.org/abs/1912.05671
  • user avatar
    Our new work demonstrating there's still a ways to go on pruning at initialization: the community seems to only know which layers -- but not which individual weights -- to prune. With a flurry of activity around these ideas, I look forward to other teams' findings as well!
    Several methods have recently been proposed for pruning neural networks at initialization. In our new paper (@KDziugaite, @roydanroy, @mcarbin), we rigorously study these methods to determine why they "miss the mark" and underperform pruning after training arxiv.org/abs/2009.08576
  • user avatar
    Congrats @jefrankle! I'm very privileged to work with fantastic students like Jonathan.
    Best Paper Award 1: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle · Michael Carbin
  • user avatar
    Last year we showed an NN can learn to model the performance of code on a CPU. But, the NN was opaque. Now, we (@alex_renda_ @TomChen17 @charith_mendis) show how to learn 11k parameters of an otherwise hand-configured 10kLOC simulator to get an accurate *and* interpretable model
    New paper at MICRO: “DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates”. DiffTune learns CPU simulator parameters from scratch, leading the simulator to higher accuracy than with expert-provided parameters. arxiv.org/abs/2010.04017. 🧵1/12
  • user avatar
    I'm always briefly puzzled when people ask, "Your group is doing ML now?" because our work is still just Approximate Computing to me. Here are my thoughts on the connections.
    Want to address the issues with overparameterization in deep learning? The PL/Systems/Architecture communities exploring Approximate Computing have some answers, says @mcarbin in his PL Perspectives post. blog.sigplan.org/2019/10/03/mac…
  • user avatar
    It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯
  • user avatar
    Learn about our work on Lottery Tickets -- small neural networks that train from scratch on big problems -- from @jefrankle today @ 3:45pm (#ICLR2019).
  • user avatar
    🚨MLSys 2023 paper deadline in 1 week🚨 @tqchenml and I look forward to your submissions! Key Dates: - Paper submission and co-author registration:, October 28, 2022 4pm ET - Author response: Jan 16 to Jan, 20, 2023 - Author notification: Feb 17, 2023