Pinned
Alexis Conneau
438 posts
Co-founder and CEO waveforms.ai (@WaveFormsAI) - Ex @OpenAI GPT-4o/AVM Audio Research Lead - #Her #TARS - Ex @AIatMeta, @Polytechnique (X11)
San Francisco
Joined September 2016
- Just released our new XLM/mBERT pytorch model in 100 languages. Significantly outperforms the TensorFlow mBERT OSS model while trained on the same Wikipedia data. bit.ly/2KItiC4 @GuillaumeLample @Thom_Wolf @PyTorch
- DATASET RELEASE: "CC100", the CommonCrawl dataset of 2.5TB of clean unsupervised text from 100 languages (used to train XLM-R) is now publicly available. You can find below the Data: data.statmt.org/cc-100/ Script: bit.ly/3oC6aXy By @VishravC et al.
- 👨🔬Life update: Happy to share that I recently joined @GoogleAI Language as a research scientist 👨🏫 I will continue my research on building neural networks that can learn with little to no supervision
- Replying to @alex_conneau
GIF - @OpenAI #GPT4o #Audio Extremely excited to share the results of what I've been working on for 2 years GPT models now natively understand audio: you can talk to the Transformer itself! The feeling is hard to describe so I can't wait for people to speak to it #HearTheAGI 🧵1/NIntroducing GPT-4o, our new model which can reason across text, audio, and video in real time. It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction):
00:00 - Happy to share our latest paper: "Self-training Improves Pretraining for Natural Language Understanding" We show that self-training is complementary to strong unsupervised pretraining (RoBERTa) on a variety of tasks. Paper: arxiv.org/abs/2010.02194 Code: github.com/facebookresear…
- New work: "Unsupervised speech recognition" TL;DR: it's possible for a neural network to transcribe speech into text with very strong performance, without being given any labeled data. Paper: ai.facebook.com/research/publi… Blog: ai.facebook.com/blog/wav2vec-u… Code: github.com/pytorch/fairse…Today we are announcing our work on building speech recognition models without any labeled data! wav2vec-U rivals some of the best supervised systems from only two years ago. Paper: ai.facebook.com/research/publi… Blog: ai.facebook.com/blog/wav2vec-u… Code: github.com/pytorch/fairse…
- Excited to announce the creation of WaveForms AI (waveforms.ai) – an Audio LLM company aiming to solve the Speech Turing Test and bring Emotional Intelligence to AI @WaveFormsAI
- Our new paper: Unsupervised Cross-lingual Representation Learning at Scale arxiv.org/pdf/1911.02116… We release XLM-R, a Transformer MLM trained in 100 langs on 2.5 TB of text data. Double digit gains on XLU benchmarks + strong per-language performance (~XLNet on GLUE). [1/6]
- [XLSR-53: Multilingual Self-Supervised Speech Transformer] We're happy to release XLSR-53: a wav2vec 2.0 model pre-trained on 56k hours of speech in 53 languages from MLS, CommonVoice and BABEL datasets! Model: github.com/pytorch/fairse… Updated paper: arxiv.org/abs/2006.13979 1/N
- Career update: A month ago, I re-joined FAIR at @MetaAI as a research scientist. I am continuing my work on self-supervised learning for Language.
- This video clip should appear at the beginning of any AI movie in the classic flashbacksA demo from 1993 of 32-year-old Yann LeCun showing off the world's first convolutional network for text recognition. #tbt #ML #neuralnetworks #CNNs #MachineLearning
00:00











