About

Hi! I am a fourth-year PhD student at the Language Technologies Institute at Carnegie Mellon University. I am advised by Graham Neubig and have wonderful friends and collaborators at Neulab :)

My research focus is on cultural inclusivity and diversity within multimodal (vision-text) generation and understanding, but I also explore image and video generation more broadly. In my PhD, I'm revisiting the age old problem of translation, and exploring how it extends to multiple modalities, especially the visual modality. Check out this GitHub repo where I've been collecting resources for cultural NLP!

I've been fortunate to have my work recognized through fellowships and awards including MIT EECS Rising Star, Rising Star in AI (UMich), BITS 30 Under 30 (Research), CMU Waibel Presidential Fellowship, and two Best Paper Awards at EMNLP 2024 and SLT 2022.

I'm deeply grateful to the brilliant researchers whose mentorship has shaped my growth: Graham Neubig (CMU), Partha Talukdar (Google DeepMind), Sebastian Ruder (Google DeepMind), Alexis Conneau (Google DeepMind), Sunayana Sitaram (Microsoft Research), Monojit Choudhury (Microsoft Research), and Dr. Sreejith V (BITS Pilani).

For more information, check out my CV or reach out via email :)

Updates

Jan 2026 Our CAIRE paper is accepted to EACL 2026 (Main)
Jan 2026 Honored to be selected for BITS 30 Under 30 - Research Leaders!
Oct 2025 Presenting at the MIT EECS Rising Star Workshop! (poster)
Oct 2025 Invited keynote at the CEGIS workshop at ICCV '25 (slides)
Oct 2025 Invited talk as Rising Star in AI at AI for Science Symposium at UMich! (slides, video)
June 2025 Won the Imminent Translated Grant for our Human-AI Image Localization Platform!
June 2025 Invited Keynote at Demographic Diversity in CV workshop at CVPR '25 (slides, video)
Mar 2025 Interning at Google DeepMind this summer with Lun Wang
Mar 2025 Invited talk at UT Austin NLL Reading Group
Jan 2025 Paper on automatic evaluation for image transcreation accepted at NAACL 2025
Jan 2025 Pangea accepted at ICLR 2025
Dec 2024 Won best paper runner-up at IEEE Big Data 2024 for our image-editing platform!
Nov 2024 Won best paper at EMNLP 2024 for our work on image transcreation!
Oct 2024 Released Pangea-7B, an open-sourced multi-(lingual, modal, cultural) model
Sept 2024 Image transcreation paper accepted at EMNLP (Main) '24!
Sept 2024 Invited talk on image transcreation at Pinterest
June 2024 Invited keynote at AmericasNLP workshop at NAACL '24
Apr 2024 Supported by Waibel Presidential Fellowship for 2024-25!
Jan 2023 Received best paper award for FLEURS at SLT 2022!
Aug 2022 Started my PhD at CMU LTI!
Aug 2022 Presented our research and its application to Google Assistant at Decode with Google 2022!
Oct 2021 Attending ALPS 2022!
Aug 2021 Conducted a hands-on TensorFlow Tutorial at the 5th CVIT IIIT Summer School
Aug 2021 Hosted the NLP networking session at IKDD 2021 with Dr. Monojit Choudhury as guest speaker
May 2021 Work on merging multiple pre-trained LMs to appear in ACL 2021 Findings
Mar 2021 MuRIL technical write-up available on arxiv and model on HuggingFace
Nov 2020 Open-sourced MuRIL, a multilingual model for Indian languages on TFHub!
Sep 2020 Hosted a Fireside Chat with Jeff Dean on his virtual Google India visit!
Aug 2020 Joined Google Research India as a Pre-Doctoral Researcher with Dr. Partha Talukdar!
Aug 2020 Graduated from BITS Pilani Goa with a dual degree in Computer Science and Economics
July 2020 GLUECoS code and leaderboard are now open-sourced!
Apr 2020 Paper on building a benchmark for code-switched language processing at ACL 2020!
Mar 2020 Created a new dataset for code-mixed conversational NLI! Paper at CALCS, LREC 2020
Jul 2019 Started bachelor thesis at Microsoft Research India with Dr. Sunayana Sitaram!
Jun 2019 Work on generating code-mixed text to appear at TLT SyntaxFest 2019
Apr 2018 Summer internship at MT-NLP lab, IIIT Hyderabad with Dr. Dipti Misra Sharma

Awards & Honors

🏆

BITS 30 Under 30 - Research Leaders

BITS Pilani Alumni Association recognition for outstanding achievements

2026
🏆

MIT EECS Rising Star

Selected for the prestigious MIT EECS Rising Stars workshop

2025
🏆

Rising Star in AI - University of Michigan

Invited speaker at the AI for Science Symposium

2025
🏅

Best Paper Award - EMNLP 2024

For "An image speaks a thousand words, but can everyone listen?" on image transcreation

2024
🏆

Waibel Presidential Fellowship

Carnegie Mellon University endowed fellowship

2024-2025
🏅

Best Paper Award - SLT 2022

For FLEURS: Few-Shot Learning Evaluation of Universal Representations of Speech

2022
🏆

ICSE National Rank 1

All India Topper, St. Mary's School, Pune

2013

Publications

Steering LLMs for Culturally Localized Generation
Simran Khanuja, Hongbin Liu, Shujian Zhang, John Lambert, Mingqing Chen, Rajiv Mathews, Lun Wang
Preprint | Under Conference Submission
HILITe: Human-AI Collaborative Framework for Image Transcreation
Simran Khanuja, Yutong Zhang, Aayush Bheemaiah, Jainish Patel, Arya Pasumarthi, Armaan Sharma, Sophia Li, Yueqi Song, Michael Saxon, Diyi Yang, Graham Neubig
HCI+NLP@EMNLP '25 | Under Conference Submission
CAIRE: Cultural Attribution of Images by Retrieval-Augmented Evaluation
Arnav Yayavaram*, Siddharth Yayavaram*, Simran Khanuja*, Michael Saxon, Graham Neubig
EACL 2026 | European Chapter of the ACL Also presented at: CEGIS@ICCV '25
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja*, Vivek Iyer*, Claire He, Graham Neubig
NAACL 2025 | Annual Conference of the Nations of the Americas Chapter of the ACL
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Xiang Yue*, Yueqi Song*, Akari Asai, Seungone Kim, Jean de Dieu Nyandwi, Simran Khanuja, Anjali Kantharuban, Lintang Sutawika, Sathyanarayanan Ramamoorthy, Graham Neubig
ICLR 2025 | International Conference on Learning Representations
🏆 Best Paper Runner-Up HILITE: Human-in-the-loop Interactive Tool for Image Editing
Arya Pasumarthi, Armaan Sharma, Jainish H. Patel, ..., Diyi Yang, Graham Neubig, Simran Khanuja
IEEE BigData 2024 | IEEE International Conference on Big Data (Undergraduate Symposium)
🏆 Best Paper An image speaks a thousand words, but can everyone listen? On translating images for cultural relevance
Simran Khanuja, Sathyanarayanan Ramamoorthy, Yueqi Song, Graham Neubig
EMNLP '24 | Conference on Empirical Methods in Natural Language Processing
DeMuX: Data-efficient Multilingual Learning
Simran Khanuja, Srinivas Gowriraj, Lucio Dery, Graham Neubig
NAACL '24 | Conference of the North American Chapter of the ACL
GlobalBench: A Benchmark for Global Progress in Natural Language Processing
Yueqi Song, Catherine Cui, Simran Khanuja, Pengfei Liu, ..., Graham Neubig
EMNLP '23 | Conference on Empirical Methods in Natural Language Processing
Multi-lingual and Multi-cultural Figurative Language Understanding
Anubha Kabra*, Emmy Liu*, Simran Khanuja*, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig
ACL '23 Findings | Annual Meeting of the ACL
🏆 Best Paper FLEURS: Few-Shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau*, Min Ma*, Simran Khanuja*, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna
SLT '22 | IEEE Spoken Language Technology Workshop
MergeDistill: Merging Pre-trained Language Models using Distillation
Simran Khanuja, Melvin Johnson, Partha Talukdar
Findings of ACL'21 | Annual Conference of the ACL
📰 Media Coverage MuRIL: Multilingual Representations for Indian Languages
Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar

Coverage: Economic Times | Indian Express | Google AI Blog

GLUECoS: An Evaluation Benchmark for Code-Switched NLP
Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury
ACL'20 | Annual Conference of the ACL
A New Dataset for Natural Language Inference from Code-mixed Conversations
Simran Khanuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury
CALCS, LREC'20 | International Conference on Language Resources and Evaluation

Invited Talks

Keynote: Cultural Inclusivity in Multimodal AI
CEGIS Workshop @ ICCV 2025
October 2025
Rising Star in AI: Research Overview
AI for Science Symposium, University of Michigan
October 2025
Keynote: Image Transcreation for Cultural Diversity
Demographic Diversity in CV Workshop @ CVPR 2025
June 2025
Keynote: Cultural NLP and Image Transcreation
AmericasNLP Workshop @ NAACL 2024
June 2024
Image Transcreation for Cultural Relevance
Google Research, Microsoft Research, IISc, Microsoft IDC, University of Edinburgh, Pinterest
Jan - Sept 2024
Decode With Google 2022
Google Research India
August 2022

Grants & Funding

Imminent Translated Research Grant
Translated | Human-AI Image Localization Platform
2025
Waibel Presidential Fellowship
Carnegie Mellon University
2024-2025