user avatar
Aäron van den Oord
@avdnoord
GenMedia lead at DeepMind: Gemini Omni, Veo, Genie, Nano Banana. Research Scientist
London, England
Joined January 2013
Posts
  • user avatar
    VQVAE-2 finally out! Powerful autoregressive models in a hierarchical compressed latent space. No modes were collapsed in the creation of these samples ;) Arixv: arxiv.org/abs/1906.00446 With @catamorphist and @vinyals More samples and details 👇 [thread]
    GIF
  • user avatar
    Unsupervised pre-training now outperforms supervised learning on ImageNet for any data regime (see figure) and also for transfer learning to Pascal VOC object detection arxiv.org/abs/1905.09272…
  • user avatar
    Our latest work is out! Representation Learning with Contrastive Predictive Coding (CPC). Autoregressive modeling meets contrastive losses in the latent space. Learn useful representations in an unsupervised way. -> On Audio, Vision, NLP and RL. Arxiv: arxiv.org/abs/1807.03748
  • user avatar
    Introducing Parallel WaveNet, or how to generate 500,000 audio samples per second :). This is our generative Text-To-Speech model that made it into the #Google Assistant. deepmind.com/blog/high-fide…
    GIF
  • user avatar
    VQ-VAE (arxiv.org/abs/1711.00937 and avdnoord.github.io/homepage/vqvae/) is now open source in DM-Sonnet! Here's an example iPython notebook on how to use it for images: github.com/deepmind/sonne…
  • user avatar
    Honored to have received the MIT TR 35 innovators award. Very grateful to have been able to work with amazing colleagues on this research! @techreview
  • user avatar
    We updated our Imagen 4 models and Ultra is tied for #1 on the lmarena leaderboard! The models are available in Google AI Studio and the Gemini API - try them out and let us know what you think.
    Exciting Text-to-Image leaderboard update! Two new Imagen 4.0 models from @GoogleDeepMind just dropped: 🥇 Imagen 4.0 Ultra (v2) ties at #1 with @OpenAI’s GPT-Image-1 🥉 Imagen 4.0 (v2) lands strong at #3 Congrats to the Google Imagen team!
  • user avatar
    Excited to share our latest results on Contrastive Predictive Coding! -A linear classifier on CPC features yield 61% ACC, outperforming the original AlexNet result with unsupervised learning. -New state of the art in semi-supervised learning w 1% labels. arxiv.org/abs/1905.09272
  • user avatar
    Our image model is on LMSYS : ) It's been an amazing effort by the team, I'm very proud of what we achieved over the last year! Try it out on now ImageFX, and soon available on AI Studio
    Breaking news from Text-to-Image Arena! 🖼️✨ @GoogleDeepMind’s Imagen 3 debuts at #1, surpassing Recraft-v3 with a remarkable +70-point lead! Congrats to the Google Imagen team for setting a new bar! Try the best text2image at LMArena and cast your vote! More analysis👇
  • user avatar
    VQ-VAE: our paper on learning discrete representations! Unsupervisedly discovers phonemes and voice style transfer arxiv.org/abs/1711.00937
  • user avatar
    WaveNet is now on your phone :). We have made it 1000x faster since the original paper one year ago. deepmind.com/blog/wavenet-l…
    GIF
  • user avatar
    Slides from my SANE 2017 talk "Neural Discrete Representation Learning". avdnoord.github.io/homepage/slide…
  • user avatar
    Excited to announce our #ICML2019 Workshop on Self-Supervised Learning! Covering- Vision, NLP, Audio, Robotics, RL ... sites.google.com/view/self-supe… Submissions now open - deadline April 25! Speakers: @ylecun, @chelseabfinn, Andrew Zisserman, Alexei Efros, Jacob Devlin, Abhinav Gupta
  • user avatar
    After Veo 3, Genie 3, Imagen 4, ... we present nano-banana*! 🍌🚀 (*aka Gemini-2.5-Flash-Image-Preview)
    Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
    00:00