Michael Poli (@MichaelPoli6) / X

Michael Poli

433 posts

Michael Poli

@MichaelPoli6

AI, numerics and systems. Co-founder & Chief AI Scientist @RadicalNumerics.

Joined August 2018

Michael Poli
@MichaelPoli6
Mar 7, 2023
Attention is great. Are there other operators that scale? Excited to share our work on Hyena, an alternative to attn that can learn on sequences *10x longer*, up to *100x faster* than optimized attn, by using implicit long convolutions & gating 📜arxiv.org/abs/2302.10866 1/
154K
Michael Poli
@MichaelPoli6
Feb 19, 2025
[1/7] Introducing Evo 2, a new foundation model for biology. 🚀 Evo 2 is the largest-scale, fully open-source AI model ever released: 40 billion parameters, over 9 trillion tokens, and a 1 million context length. All the details are public: weights, data, training infrastructure,
75K
Michael Poli
@MichaelPoli6
Mar 28, 2024
📢New research on mechanistic architecture design and scaling laws. - We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date - For the first time, we show that architecture performance on a set of isolated token
126K
Michael Poli
@MichaelPoli6
Apr 27, 2020
[1/4] Excited to share the first experimental release of *torchdyn* github.com/DiffEqML/torch…, a PyTorch library for all things neural differential equations! torchdyn is developed by the core DiffEqML team. @Massastrello @Diffeq_ml
Michael Poli
@MichaelPoli6
Aug 25, 2025
Life update: I started Radical Numerics with Stefano Massaroli, Armin Thomas, Eric Nguyen, and a fantastic team of engineers and researchers. We are building the engine for recursive self‑improvement (RSI): AI that designs and refines AI, accelerating discovery across science and
30K
Michael Poli
@MichaelPoli6
Sep 30, 2024
This is what happens when a world-class team sits down and rethinks the way things are done, from architecture design to post-training. Today, we release three language models pushing the boundaries of quality and efficiency, with SOTA performance, minimal memory footprint, and
Liquid AI
@liquidai
Sep 30, 2024
Today we introduce Liquid Foundation Models (LFMs) to the world with the first series of our Language LFMs: A 1B, 3B, and a 40B model. (/n)
35K
Michael Poli
@MichaelPoli6
Nov 14, 2024
An absolute privilege to see our work on Evo🧬 highlighted on the cover of the latest issue of Science. Thank you to all the friends and collaborators at Stanford (@StanfordAILab) and the Arc Institute (@arcinstitute) @exnx @BrianHie @pdhsu @HazyResearch @StefanoErmon and more.
Science Magazine
@ScienceMagazine
Nov 14, 2024
A new Science study presents “Evo”—a machine learning model capable of decoding and designing DNA, RNA, and protein sequences, from molecular to genome scale, with unparalleled accuracy. Evo’s ability to predict, generate, and engineer entire genomic sequences could change the
49K
Michael Poli
@MichaelPoli6
Dec 8, 2023
We've been hard at work pushing the frontiers of efficient architecture design and optimization. StripedHyena-7B is the result: the first alternative architecture truly competitive with the best Transformers of its size or larger. And it's very fast.
Together AI
@togethercompute
Dec 8, 2023
Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context. It builds on the lessons learned in past year designing efficient sequence modeling architectures. together.ai/blog/stripedhy…
35K
Michael Poli
@MichaelPoli6
Jun 11, 2022
Let us embark on a fractal journey about dynamical systems and neural implicit representations... 1/
GIF
Michael Poli
@MichaelPoli6
Jun 8, 2023
Hungry for more content on efficient long context models after @srush_nlp's awesome keynote? We put together some of our perspectives in a short note:
Sasha Rush
@srush_nlp
Jun 4, 2023
Do we need Attention? (v0 github.com/srush/do-we-ne…): Slides for a survey talk summarizing recent Linear RNN models with a focus on NLP. Tries to cover a lot of different S4-related models (as well as RWKV/MEGA) in a digestible way.
hazyresearch.stanford.edu
The Safari of Deep Signal Processing: Hyena and Beyond
Hyena is a large language model that uses long convolutions and gating to reach attention quality with lower time complexity.
36K
Michael Poli
@MichaelPoli6
Dec 12, 2021
Join us Dec 14th (EST time) for the NeurIPS workshop "The Symbiosis of Deep Learning and Differential Equations": dl-de.github.io This is also your chance to submit questions to our great lineup of panelists, via: forms.gle/6seK279g4AxpeM…
Michael Poli
@MichaelPoli6
Mar 5, 2025
New version of the StripedHyena 2 paper is out on arXiv To learn about how we trained large (40 billion parameters) convolutional language models efficiently at one million sequence length, with custom context parallelism: 👇 All code is available
6.7K
Michael Poli
@MichaelPoli6
Jul 25, 2020
[1/n] The community has been hard at work to speed up Neural ODEs, e.g. regularization strategies @DavidDuvenaud @chuckberryfinn to keep the ODE easy to solve. We've also been thinking about the same problem, and we propose a different (compatible!) direction. @Massastrello
GIF
Michael Poli
@MichaelPoli6
Dec 10, 2023
I'm going to be at NeurIPS to present work on efficient model architecture and inference (with @exnx @Massastrello and others) HyenaDNA: arxiv.org/abs/2306.15794 Laughing Hyena: arxiv.org/abs/2310.18780 Excited to catch up with old friends and make some new ones - DM if you'd
arxiv.org
HyenaDNA: Long-Range Genomic Sequence Modeling at Single...
Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in...
10K