Sparsity is one of the most promising areas in deep learning (tokens follow different routes in the model). However, these discrete decisions are messy to handle & optimize. Today we introduce Soft-MoE. The idea is simple: Don't route tokens, route linear combinations of them.
Carlos Riquelme
234 posts
- Estamos creciendo en @MicrosoftAI Madrid, España! Buscamos research engineers y scientists para entrenar los mayores modelos fundacionales de AI de Microsoft. Una oportunidad única en España para formar parte de un AI lab de primer nivel mundial 🚀 DM si te interesa para + info!
- We present LIMoE, the first large multimodal sparse model! By training routers to find good paths for image & text tokens, it's able to apply the *same* layers to both (vs standard approach of independent image & text encoders) Exciting progress on multimodal multitask networks!Read all about LIMoE (the Language Image Mixture of Experts), the first large-scale architecture that processes both images and text using a sparse mixture of experts, which achieves high performance with much less compute than other top methods. → goo.gle/3mwj9KC
GIF - Just released our code and data for a number of contextual bandits algorithms based on deep neural networks and Thompson sampling 😀 with @latentjasper and @georgejtucker 🙏 reach out to us if you wanna add your new Bayesian NN model! github.com/tensorflow/mod… @GoogleAI
- Below you see the currently best open small language models. While not getting according visibility, StableLM2 1.6B reigns here (and note Gemma 2B is 56% larger!). Released in Jan & today we share its report. Data + training disclosed in detail, unlike all others but TinyLlama.
- Hoy anunciamos StableLM2 12B, un modelo de lenguaje open-source en español muy potente y de coste asequible. Entrenado desde cero con datos públicos, permite el uso de RAG, funciones y tools externas. Excelente para resumir y extraer datos de cualquier texto. DM si te interesa!Stable LM 2 12B is a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model. You can now try the model here: huggingface.co/stabilityai/st…
- We release Stable Code Instruct 3B today. A super fast & light model that helps you code in the most popular programming languages such as Python, Javascript or SQL. Awesome work by the language team at Stability! See video below & try it out here:Introducing Stable Code Instruct 3B, our new instruction tuned LLM based on Stable Code 3B. With natural language prompting, this model can handle a variety of tasks such as code generation, math and other software engineering related outputs. This model’s performance rivals
GIF - Vision Transformers can drop many tokens (say, background, redundant, noise, etc) & still succeed at image classification. We propose an extremely simple way to merge tokens and save up to 40-50% training cost in huge models while keeping their performance arxiv.org/abs/2202.12015
- Next Tuesday & Wednesday we'll be hosting a workshop on Sparsity & Adaptive Computation! More than 30 speakers from Google, other industry labs & many universities will share their views on scaling language & vision models. Keynotes by @JeffDean & Dave Patterson! See agenda below
- The recordings from the Google Workshop on Sparsity and Adaptive Computation are now available at: Day 1: youtube.com/watch?v=QBqRnQ… (includes the keynotes from Dave Patterson and Jeff Dean) Day 2: youtube.com/watch?v=I39Rl7… The agenda for the talks covered in the videos is below.
- Today we release StableLM2 1.6B, a small open language model that is strikingly fluid in English, Spanish, German, Italian, French, Portuguese & Dutch. Beyond its strong metrics (those age fast), hope it helps push what's possible with tiny models. Confident it's even a lot more!Today, we’re releasing Stable LM 2 1.6B, a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. This model’s size and speed reduce hardware limitations, allowing all to easily
- Today we announce the largest vision model to date, VIT-22B. Larger models do better, while training & inference costs increase accordingly. This may rule out their direct use in practical scenarios. Fortunately, we show smaller models also benefit from large foundation ones.
- We just released our newest & most powerful language model: StableLM2 12B. Get large-size model performance while only requiring medium-size model memory & latency! Multilingual, function calling & tool usage, RAG-friendly, safety tuned. Great work by a great team 🚀check it out!Stable LM 2 12B is a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model. You can now try the model here: huggingface.co/stabilityai/st…
- Can we scale deep vision models to billions of parameters? Yes! By only activating the relevant parts of the network for each input. We present the Vision Mixture of Experts & train a 15B-parameter model with 24 routers; transfers to ImageNet w. 90.35% acc











