user avatar
Neil Zeghidour
@neilzegh
CEO @GradiumAI. Founder of @kyutai_labs. Invented neural codecs and audio LLMs. Prev. Google DeepMind/Brain, Meta, Toha Heavy Industries.
Paris
Joined January 2020
Posts
  • Pinned
    user avatar
    Announcing Gradium. After 10 years of pushing audio research at Meta, Google and Kyutai, I'm joining the start-up arena with my day 1s to take our models from the lab to every voice product out there. Game on.
    Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.
    00:00
  • user avatar
    "AudioLM: a language modeling approach to audio generation", w/ @zalanborsos and colleagues. AudioLM achieves high-quality and long-term consistency, from audio only. Meaningful speech w/o text, consistent piano w/o MIDI. 📃: arxiv.org/abs/2209.03143 🔊: google-research.github.io/seanet/audiolm…
    00:00
  • user avatar
    Excited to finally share that I left Google DeepMind to create Kyutai w/ talented colleagues, a non-profit lab based in Paris and dedicated to open-science, w/ ~300M€ of funding so far. We are starting with multimodal LLMs, for everyone, for free. No distraction, just science.
    Announcing Kyutai: a non-profit AI lab dedicated to open science. Thanks to Xavier Niel (@GroupeIliad), Rodolphe Saadé (@cmacgm) and Eric Schmidt (@SchmidtFutures ), we are starting with almost 300M€ of philanthropic support. Meet the team ⬇️
  • user avatar
    When I was looking for a PhD position, @ylecun opened the Paris office and I had the chance to join as a permanent PhD student in the first batch. Most important moment of my career, and this lab gave birth to the whole Paris ecosystem.
    Hats off to @ylecun! FAIR shaped my career, period. I truly thanks @AIatMeta and FAIR to provide such a nice place for independent exploration and open research! End of an era and forever remember. ft.com/content/c586eb…
  • user avatar
    Today we release Hibiki, real-time speech translation that runs on your phone. Adaptive flow without fancy policy, simple temperature sampling of a multistream audio-text LM. Very proud of @tom_labiausse 's work as an intern.
    00:00
    01:29
    Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧. Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the
  • user avatar
    Wavesplit is out! arxiv.org/abs/2002.08933 Wavesplit is a speech separation system that jointly learns to identify and separate speakers from a single mic. Significant improvement of sota on clean, noisy and reverberant mixtures of 2 or 3 speakers. Demo below! 1/2
  • user avatar
  • user avatar
    Our work on learning strides of convolutional networks by backpropagation has received an Outstanding Paper Award from ICLR 2022! 🎉🥳 blog.iclr.cc/2022/04/20/ann…
    Tired of cross-validating strides in CNNs? Learn them! Our ICLR spotlight "Learning strides in convolutional neural networks" introduces DiffStride, the first pooling layer that learns its strides by backpropagation. Paper: arxiv.org/abs/2202.01653 Code: github.com/google-researc…
  • user avatar
    Interesting to attribute this to Meta, when it's really that François Fleuret has been one of the most creative "neural architects" for a while.
    🚨 Holy shit...Meta just rewrote how Transformers think. They built something called The Free Transformer and it breaks the core rule every GPT model has lived by since 2017. For 8 years, Transformers have been blindfolded forced to guess the next token one at a time, no inner
  • user avatar
    Tired of cross-validating strides in CNNs? Learn them! Our ICLR spotlight "Learning strides in convolutional neural networks" introduces DiffStride, the first pooling layer that learns its strides by backpropagation. Paper: arxiv.org/abs/2202.01653 Code: github.com/google-researc…
  • user avatar
    Our team at Google DeepMind in Paris is hiring a Research Scientist to work on audio generative models (AudioLM, MusicLM) and more broadly on large-scale sequence and signal modelling. Join us!
  • user avatar
    AudioLM generates high-quality speech (SPEAR-TTS) and music (MusicLM, SingSong), but is quite slow. SoundStorm speeds it up by 100x, generating 30s of audio in 0.5s on TPU, unlocking long-form generation (e.g. dialogues). Work led by @zalanborsos ! 📜🎶: google-research.github.io/seanet/soundst…
    00:00
  • user avatar
    Voice AIs handle speaker turns & interruptions with Voice Activity Detection. VAD is brittle and will trigger due to background noise, creating frequent hiccups. Moshi gets rid of it completely, so you can use it in the most chaotic settings. I myself couldn't hear Moshi here 😅
    00:00
    Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: kyutai.org/Moshi.pdf Repo:
  • user avatar
    Looking for a learnable alternative to mel-filterbanks? We just released the code for our ICLR paper LEAF. * Incl. LEAF, Mel-fbanks, SincNet, SincNet+, TFBanks * Keras models for PCEN, SpecAugment, PANN * An example training loop