Hi! I am a fourth-year PhD student at the
Language Technologies Institute
at Carnegie Mellon University. I am advised by
Graham Neubig and have
wonderful friends and collaborators at
Neulab :)
My research focus is on cultural inclusivity and diversity within multimodal (vision-text)
generation and understanding, but I also explore image and video generation more broadly. In my PhD,
I'm revisiting the age old problem of translation, and exploring how it extends to multiple modalities, especially the visual modality. Check out this
GitHub repo
where I've been collecting resources for cultural NLP!
I've been fortunate to have my work recognized through fellowships and awards including
MIT EECS Rising Star,
Rising Star in AI (UMich),
BITS 30 Under 30 (Research),
CMU Waibel Presidential Fellowship, and two Best Paper Awards at
EMNLP 2024 and
SLT 2022.
I'm deeply grateful to the brilliant researchers whose mentorship has shaped my growth:
Graham Neubig (CMU),
Partha Talukdar (Google DeepMind),
Sebastian Ruder (Google DeepMind),
Alexis Conneau (Google DeepMind),
Sunayana Sitaram (Microsoft Research),
Monojit Choudhury (Microsoft Research), and
Dr. Sreejith V (BITS Pilani).
For more information, check out my CV
or reach out via email :)
Steering LLMs for Culturally Localized Generation
Simran Khanuja, Hongbin Liu, Shujian Zhang, John Lambert, Mingqing Chen, Rajiv Mathews, Lun Wang
Preprint | Under Conference Submission
HILITe: Human-AI Collaborative Framework for Image Transcreation
Simran Khanuja, Yutong Zhang, Aayush Bheemaiah, Jainish Patel, Arya Pasumarthi, Armaan Sharma, Sophia Li, Yueqi Song, Michael Saxon, Diyi Yang, Graham Neubig
HCI+NLP@EMNLP '25 | Under Conference Submission
CAIRE: Cultural Attribution of Images by Retrieval-Augmented Evaluation
Arnav Yayavaram*, Siddharth Yayavaram*, Simran Khanuja*, Michael Saxon, Graham Neubig
EACL 2026 | European Chapter of the ACL
Also presented at: CEGIS@ICCV '25
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja*, Vivek Iyer*, Claire He, Graham Neubig
NAACL 2025 | Annual Conference of the Nations of the Americas Chapter of the ACL
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Xiang Yue*, Yueqi Song*, Akari Asai, Seungone Kim, Jean de Dieu Nyandwi, Simran Khanuja, Anjali Kantharuban, Lintang Sutawika, Sathyanarayanan Ramamoorthy, Graham Neubig
ICLR 2025 | International Conference on Learning Representations
🏆 Best Paper Runner-Up
HILITE: Human-in-the-loop Interactive Tool for Image Editing
Arya Pasumarthi, Armaan Sharma, Jainish H. Patel, ..., Diyi Yang, Graham Neubig, Simran Khanuja
IEEE BigData 2024 | IEEE International Conference on Big Data (Undergraduate Symposium)
🏆 Best Paper
An image speaks a thousand words, but can everyone listen? On translating images for cultural relevance
Simran Khanuja, Sathyanarayanan Ramamoorthy, Yueqi Song, Graham Neubig
EMNLP '24 | Conference on Empirical Methods in Natural Language Processing
DeMuX: Data-efficient Multilingual Learning
Simran Khanuja, Srinivas Gowriraj, Lucio Dery, Graham Neubig
NAACL '24 | Conference of the North American Chapter of the ACL
GlobalBench: A Benchmark for Global Progress in Natural Language Processing
Yueqi Song, Catherine Cui, Simran Khanuja, Pengfei Liu, ..., Graham Neubig
EMNLP '23 | Conference on Empirical Methods in Natural Language Processing
Multi-lingual and Multi-cultural Figurative Language Understanding
Anubha Kabra*, Emmy Liu*, Simran Khanuja*, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig
ACL '23 Findings | Annual Meeting of the ACL
🏆 Best Paper
FLEURS: Few-Shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau*, Min Ma*, Simran Khanuja*, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna
SLT '22 | IEEE Spoken Language Technology Workshop
MergeDistill: Merging Pre-trained Language Models using Distillation
Simran Khanuja, Melvin Johnson, Partha Talukdar
Findings of ACL'21 | Annual Conference of the ACL
📰 Media Coverage
MuRIL: Multilingual Representations for Indian Languages
Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar
Coverage: Economic Times |
Indian Express |
Google AI Blog
GLUECoS: An Evaluation Benchmark for Code-Switched NLP
Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan, Sunayana Sitaram, Monojit Choudhury
ACL'20 | Annual Conference of the ACL
A New Dataset for Natural Language Inference from Code-mixed Conversations
Simran Khanuja, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury
CALCS, LREC'20 | International Conference on Language Resources and Evaluation