Attention is great. Are there other operators that scale?
Excited to share our work on Hyena, an alternative to attn that can learn on sequences *10x longer*, up to *100x faster* than optimized attn, by using implicit long convolutions & gating
📜arxiv.org/abs/2302.10866 1/
Michael Poli
433 posts
Joined August 2018
- [1/7] Introducing Evo 2, a new foundation model for biology. 🚀 Evo 2 is the largest-scale, fully open-source AI model ever released: 40 billion parameters, over 9 trillion tokens, and a 1 million context length. All the details are public: weights, data, training infrastructure,
- 📢New research on mechanistic architecture design and scaling laws. - We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date - For the first time, we show that architecture performance on a set of isolated token
- [1/4] Excited to share the first experimental release of *torchdyn* github.com/DiffEqML/torch…, a PyTorch library for all things neural differential equations! torchdyn is developed by the core DiffEqML team. @Massastrello @Diffeq_ml
- Life update: I started Radical Numerics with Stefano Massaroli, Armin Thomas, Eric Nguyen, and a fantastic team of engineers and researchers. We are building the engine for recursive self‑improvement (RSI): AI that designs and refines AI, accelerating discovery across science and
- This is what happens when a world-class team sits down and rethinks the way things are done, from architecture design to post-training. Today, we release three language models pushing the boundaries of quality and efficiency, with SOTA performance, minimal memory footprint, andToday we introduce Liquid Foundation Models (LFMs) to the world with the first series of our Language LFMs: A 1B, 3B, and a 40B model. (/n)
- An absolute privilege to see our work on Evo🧬 highlighted on the cover of the latest issue of Science. Thank you to all the friends and collaborators at Stanford (@StanfordAILab) and the Arc Institute (@arcinstitute) @exnx @BrianHie @pdhsu @HazyResearch @StefanoErmon and more.A new Science study presents “Evo”—a machine learning model capable of decoding and designing DNA, RNA, and protein sequences, from molecular to genome scale, with unparalleled accuracy. Evo’s ability to predict, generate, and engineer entire genomic sequences could change the
- We've been hard at work pushing the frontiers of efficient architecture design and optimization. StripedHyena-7B is the result: the first alternative architecture truly competitive with the best Transformers of its size or larger. And it's very fast.Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context. It builds on the lessons learned in past year designing efficient sequence modeling architectures. together.ai/blog/stripedhy…
- Let us embark on a fractal journey about dynamical systems and neural implicit representations... 1/
GIF - Hungry for more content on efficient long context models after @srush_nlp's awesome keynote? We put together some of our perspectives in a short note:Do we need Attention? (v0 github.com/srush/do-we-ne…): Slides for a survey talk summarizing recent Linear RNN models with a focus on NLP. Tries to cover a lot of different S4-related models (as well as RWKV/MEGA) in a digestible way.hazyresearch.stanford.eduThe Safari of Deep Signal Processing: Hyena and BeyondHyena is a large language model that uses long convolutions and gating to reach attention quality with lower time complexity.
- Join us Dec 14th (EST time) for the NeurIPS workshop "The Symbiosis of Deep Learning and Differential Equations": dl-de.github.io This is also your chance to submit questions to our great lineup of panelists, via: forms.gle/6seK279g4AxpeM…
- New version of the StripedHyena 2 paper is out on arXiv To learn about how we trained large (40 billion parameters) convolutional language models efficiently at one million sequence length, with custom context parallelism: 👇 All code is available
- [1/n] The community has been hard at work to speed up Neural ODEs, e.g. regularization strategies @DavidDuvenaud @chuckberryfinn to keep the ODE easy to solve. We've also been thinking about the same problem, and we propose a different (compatible!) direction. @Massastrello
GIF - I'm going to be at NeurIPS to present work on efficient model architecture and inference (with @exnx @Massastrello and others) HyenaDNA: arxiv.org/abs/2306.15794 Laughing Hyena: arxiv.org/abs/2310.18780 Excited to catch up with old friends and make some new ones - DM if you'd






















