Anne Wu

I'm a PhD student in Computer Science at Cornell University advised by Yoav Artzi. I work on interactive learning, spanning natural language processing, multimodality, machine learning, and reinforcement learning.

In particular, my research focuses on models that interact with humans and environments across different modalities (e.g., text, audio, vision), and learn from those interactions. I’m especially interested in settings where the model can improve under realistic constraints.

Prior to Cornell, I worked on speech translation at Facebook AI Research with Jiatao Gu, Changhan Wang and Juan Pino. Before that, I graduated from CentraleSupélec and Cambridge University, and spent some time in finance.

Email / Twitter

Publications

Google Scholar / Semantic Scholar

Aligning Spoken Dialogue Models from User Interactions
Anne Wu, Laurent Mazaré, Neil Zeghidour, and Alexandre Défossez
ICML 2025

Imitation Learning from a Single Temporally Misaligned Video
William Huey*, Huaxiaoyue Wang*, Anne Wu, Yoav Artzi, and Sanjiban Choudhury
ICML 2025
pdf | code

Retrospective Learning from Interactions
Zizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu, and Yoav Artzi
ACL 2025 (long paper) Oral
pdf | code | website

Time Your Rewards: Learning Temporally Consistent Rewards from a Single Video Demonstration
Huaxiaoyue Wang*, William Huey*, Anne Wu, Yoav Artzi, Sanjiban Choudhury
CoRL 24’ Workshop on Whole-body Control and Bimanual Manipulation
paper

A Surprising Failure? Multimodal LLMs and the NLVR Challenge
Anne Wu, Kianté Brantley, and Yoav Artzi
arXiv (technical report) 2024
pdf

lilGym: Natural Language Visual Reasoning with Reinforcement Learning

Anne Wu, Kianté Brantley, Noriyuki Kojima, and Yoav Artzi
ACL 2023 (long paper)
pdf | website | code & data | baselines | poster
Also presented at NeurIPS 22' Language and Reinforcement Learning (LaReL)

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

Changhan Wang*, Anne Wu*, Jiatao Gu, and Juan Pino*
Interspeech 2021
pdf | data | code | blog

Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Changhan Wang*, Anne Wu*, Juan Pino*, Alexei Baevski, Michael Auli, and Alexis Conneau
Interspeech 2021
pdf

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, and Emmanuel Dupoux
ACL 2021
pdf | code & data

Self-supervised representations improve end-to-end speech translation
Anne Wu, Changhan Wang, Juan Pino, and Jiatao Gu
Interspeech 2020
pdf

fairseq s2t: Fast speech-to-text modeling with fairseq
Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, and Juan Pino
AACL 2020: System Demonstrations
pdf | code

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Changhan Wang, Juan Pino, Anne Wu, and Jiatao Gu
LREC 2020
pdf