Carlos Riquelme (@rikelhood) / X

Carlos Riquelme

234 posts

Carlos Riquelme

@rikelhood

principal researcher @MicrosoftAI previously @StabilityAI @GoogleBrain @Stanford

Madrid, Spain

Joined October 2012

Carlos Riquelme
@rikelhood
Aug 3, 2023
Sparsity is one of the most promising areas in deep learning (tokens follow different routes in the model). However, these discrete decisions are messy to handle & optimize. Today we introduce Soft-MoE. The idea is simple: Don't route tokens, route linear combinations of them.
48K
Carlos Riquelme
@rikelhood
Jun 29, 2025
Estamos creciendo en @MicrosoftAI Madrid, España! Buscamos research engineers y scientists para entrenar los mayores modelos fundacionales de AI de Microsoft. Una oportunidad única en España para formar parte de un AI lab de primer nivel mundial 🚀 DM si te interesa para + info!
50K
Carlos Riquelme
@rikelhood
Jun 9, 2022
We present LIMoE, the first large multimodal sparse model! By training routers to find good paths for image & text tokens, it's able to apply the *same* layers to both (vs standard approach of independent image & text encoders) Exciting progress on multimodal multitask networks!
Google AI
@GoogleAI
Jun 9, 2022
Read all about LIMoE (the Language Image Mixture of Experts), the first large-scale architecture that processes both images and text using a sparse mixture of experts, which achieves high performance with much less compute than other top methods. → goo.gle/3mwj9KC
GIF
Carlos Riquelme
@rikelhood
Jul 24, 2018
Just released our code and data for a number of contextual bandits algorithms based on deep neural networks and Thompson sampling 😀 with @latentjasper and @georgejtucker 🙏 reach out to us if you wanna add your new Bayesian NN model! github.com/tensorflow/mod… @GoogleAI
Carlos Riquelme
@rikelhood
Feb 29, 2024
Below you see the currently best open small language models. While not getting according visibility, StableLM2 1.6B reigns here (and note Gemma 2B is 56% larger!). Released in Jan & today we share its report. Data + training disclosed in detail, unlike all others but TinyLlama.
92K
Carlos Riquelme
@rikelhood
Apr 9, 2024
Hoy anunciamos StableLM2 12B, un modelo de lenguaje open-source en español muy potente y de coste asequible. Entrenado desde cero con datos públicos, permite el uso de RAG, funciones y tools externas. Excelente para resumir y extraer datos de cualquier texto. DM si te interesa!
Stability AI
@StabilityAI
Apr 8, 2024
Stable LM 2 12B is a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model. You can now try the model here: huggingface.co/stabilityai/st…
40K
Carlos Riquelme
@rikelhood
Mar 25, 2024
We release Stable Code Instruct 3B today. A super fast & light model that helps you code in the most popular programming languages such as Python, Javascript or SQL. Awesome work by the language team at Stability! See video below & try it out here:
Stability AI
@StabilityAI
Mar 25, 2024
Introducing Stable Code Instruct 3B, our new instruction tuned LLM based on Stable Code 3B. With natural language prompting, this model can handle a variety of tasks such as code generation, math and other software engineering related outputs. This model’s performance rivals
GIF
Stable Code Instruct 3b - a Hugging Face Space by stabilityai
From huggingface.co
33K
Carlos Riquelme
@rikelhood
Mar 1, 2022
Vision Transformers can drop many tokens (say, background, redundant, noise, etc) & still succeed at image classification. We propose an extremely simple way to merge tokens and save up to 40-50% training cost in huge models while keeping their performance arxiv.org/abs/2202.12015
Carlos Riquelme
@rikelhood
Oct 5, 2022
Next Tuesday & Wednesday we'll be hosting a workshop on Sparsity & Adaptive Computation! More than 30 speakers from Google, other industry labs & many universities will share their views on scaling language & vision models. Keynotes by @JeffDean & Dave Patterson! See agenda below
Carlos Riquelme
@rikelhood
Oct 20, 2022
The recordings from the Google Workshop on Sparsity and Adaptive Computation are now available at: Day 1: youtube.com/watch?v=QBqRnQ… (includes the keynotes from Dave Patterson and Jeff Dean) Day 2: youtube.com/watch?v=I39Rl7… The agenda for the talks covered in the videos is below.
Carlos Riquelme
@rikelhood
Jan 20, 2024
Today we release StableLM2 1.6B, a small open language model that is strikingly fluid in English, Spanish, German, Italian, French, Portuguese & Dutch. Beyond its strong metrics (those age fast), hope it helps push what's possible with tiny models. Confident it's even a lot more!
Stability AI
@StabilityAI
Jan 19, 2024
Today, we’re releasing Stable LM 2 1.6B, a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. This model’s size and speed reduce hardware limitations, allowing all to easily
20K
Carlos Riquelme
@rikelhood
Feb 13, 2023
Today we announce the largest vision model to date, VIT-22B. Larger models do better, while training & inference costs increase accordingly. This may rule out their direct use in practical scenarios. Fortunately, we show smaller models also benefit from large foundation ones.
20K
Carlos Riquelme
@rikelhood
Apr 9, 2024
We just released our newest & most powerful language model: StableLM2 12B. Get large-size model performance while only requiring medium-size model memory & latency! Multilingual, function calling & tool usage, RAG-friendly, safety tuned. Great work by a great team 🚀check it out!
Stability AI
@StabilityAI
Apr 8, 2024
Stable LM 2 12B is a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model. You can now try the model here: huggingface.co/stabilityai/st…
11K
Carlos Riquelme
@rikelhood
Jun 14, 2021
Can we scale deep vision models to billions of parameters? Yes! By only activating the relevant parts of the network for each input. We present the Vision Mixture of Experts & train a 15B-parameter model with 24 routers; transfers to ImageNet w. 90.35% acc
arxiv.org
Scaling Vision with Sparse Mixture of Experts
Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are "dense",...