user avatar
Carlos Riquelme
@rikelhood
principal researcher @MicrosoftAI previously @StabilityAI @GoogleBrain @Stanford
Madrid, Spain
Joined October 2012
  • user avatar
    Sparsity is one of the most promising areas in deep learning (tokens follow different routes in the model). However, these discrete decisions are messy to handle & optimize. Today we introduce Soft-MoE. The idea is simple: Don't route tokens, route linear combinations of them.
    https://arxiv.org/abs/2308.00951
  • user avatar
    Estamos creciendo en @MicrosoftAI Madrid, España! Buscamos research engineers y scientists para entrenar los mayores modelos fundacionales de AI de Microsoft. Una oportunidad única en España para formar parte de un AI lab de primer nivel mundial 🚀 DM si te interesa para + info!
  • user avatar
    We present LIMoE, the first large multimodal sparse model! By training routers to find good paths for image & text tokens, it's able to apply the *same* layers to both (vs standard approach of independent image & text encoders) Exciting progress on multimodal multitask networks!
    Read all about LIMoE (the Language Image Mixture of Experts), the first large-scale architecture that processes both images and text using a sparse mixture of experts, which achieves high performance with much less compute than other top methods. → goo.gle/3mwj9KC
    GIF
  • user avatar
    Just released our code and data for a number of contextual bandits algorithms based on deep neural networks and Thompson sampling 😀 with @latentjasper and @georgejtucker 🙏 reach out to us if you wanna add your new Bayesian NN model! github.com/tensorflow/mod… @GoogleAI
  • user avatar
    Below you see the currently best open small language models. While not getting according visibility, StableLM2 1.6B reigns here (and note Gemma 2B is 56% larger!). Released in Jan & today we share its report. Data + training disclosed in detail, unlike all others but TinyLlama.
  • user avatar
    Hoy anunciamos StableLM2 12B, un modelo de lenguaje open-source en español muy potente y de coste asequible. Entrenado desde cero con datos públicos, permite el uso de RAG, funciones y tools externas. Excelente para resumir y extraer datos de cualquier texto. DM si te interesa!
    Stable LM 2 12B is a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model. You can now try the model here: huggingface.co/stabilityai/st…
  • user avatar
    We release Stable Code Instruct 3B today. A super fast & light model that helps you code in the most popular programming languages such as Python, Javascript or SQL. Awesome work by the language team at Stability! See video below & try it out here:
    Introducing Stable Code Instruct 3B, our new instruction tuned LLM based on Stable Code 3B. With natural language prompting, this model can handle a variety of tasks such as code generation, math and other software engineering related outputs. This model’s performance rivals
    GIF
  • user avatar
    Vision Transformers can drop many tokens (say, background, redundant, noise, etc) & still succeed at image classification. We propose an extremely simple way to merge tokens and save up to 40-50% training cost in huge models while keeping their performance arxiv.org/abs/2202.12015
  • user avatar
    Next Tuesday & Wednesday we'll be hosting a workshop on Sparsity & Adaptive Computation! More than 30 speakers from Google, other industry labs & many universities will share their views on scaling language & vision models. Keynotes by @JeffDean & Dave Patterson! See agenda below
    Workshop Agenda https://rsvp.withgoogle.com/events/googleworkshopsparsityadaptivecomputation-2022/agenda
  • user avatar
    The recordings from the Google Workshop on Sparsity and Adaptive Computation are now available at: Day 1: youtube.com/watch?v=QBqRnQ… (includes the keynotes from Dave Patterson and Jeff Dean) Day 2: youtube.com/watch?v=I39Rl7… The agenda for the talks covered in the videos is below.
    Agenda for the Google Workshop on Sparsity and Adaptive Computation.
  • user avatar
    Today we release StableLM2 1.6B, a small open language model that is strikingly fluid in English, Spanish, German, Italian, French, Portuguese & Dutch. Beyond its strong metrics (those age fast), hope it helps push what's possible with tiny models. Confident it's even a lot more!
    Today, we’re releasing Stable LM 2 1.6B, a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. This model’s size and speed reduce hardware limitations, allowing all to easily
  • user avatar
    Today we announce the largest vision model to date, VIT-22B. Larger models do better, while training & inference costs increase accordingly. This may rule out their direct use in practical scenarios. Fortunately, we show smaller models also benefit from large foundation ones.
  • user avatar
    We just released our newest & most powerful language model: StableLM2 12B. Get large-size model performance while only requiring medium-size model memory & latency! Multilingual, function calling & tool usage, RAG-friendly, safety tuned. Great work by a great team 🚀check it out!
    Stable LM 2 12B is a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model. You can now try the model here: huggingface.co/stabilityai/st…
  • user avatar
    Can we scale deep vision models to billions of parameters? Yes! By only activating the relevant parts of the network for each input. We present the Vision Mixture of Experts & train a 15B-parameter model with 24 routers; transfers to ImageNet w. 90.35% acc