|
Adhiraj Ghosh
I am a first year ELLIS PhD canditate working with Dr. Matthias Bethge at the University of Tübingen. I am also affiliated with the International Max Planck Research School for Intelligent Systems. My interests are mostly centred around developing data-centric approaches to improve machine learning models across several modalities (text, image, video, audio and 3D) as well as exposing the failure points of these models by creating better eval sets and benchmarking strategies.
I completed my MSc in Machine Learning at the University of Tübingen in 2024, during which I worked on visualising figurative speech at the Computer Graphics Group, which led to an Outstanding Paper award at EMNLP 2023.
Before starting my master's, I used to be a Computer Vision Researcher at the Center of Artificial Intelligence,ZHAW, working on domain adaptation in Optical Music Recognition.
I have also worked with Dr. Daniel Lin Wen-Yan at SMU on feature correspondence-based object tracking. I completed my BSc in Electrical and Electronics Engineering in Manipal/Singapore.
I am very eager to collaborate on relevant projects, so please reach out if you are interested!
Email  / 
CV  / 
Google Scholar  / 
Github  / 
Twitter  / 
Bluesky  / 
LinkedIn  / 
YouTube
|
|
Recent News
- Nov 2025 : Work on concept-aware online batch sampling out on arXiv!
- Jun 2025 : Started my PhD!
- May 2025 : ONEBench accepted at ACL 2025 as a poster!
- Nov 2024 : Defended my MSc thesis!
- Sep 2024 : No Zero-shot was accepted at NeurIPS as a poster! Check out coverage by Computerphile and AI 'N Stuff!
---- Show More ----
- Dec 2023 : ViPE awarded outstanding paper at EMNLP 2023!
- Sep 2023 : Work on Real World Music Object Recognition published in TISMIR.
- Oct 2022 : Moved to Germany! Started my MSc at the University of Tübingen.
- Aug 2022 : RPTM accepted for oral presentation at WACV 2023. Check out the paper and SOTA comparisons!
|
|
Work Experience
Mar 2023 - Sep 2023: Research Assistant at the Computer Graphics group, Tübingen AI Centre.
May 2021 - Aug 2022: Computer Vision Researcher, Zürich University of Applied Sciences.
Jan 2020 - Dec 2020: Visiting Researcher, Singapore Management University
Jun 2018 - Aug 2019 : Undergraduate Research Intern, Jadavpur University.
|
|
|
Concept-Aware Batch Sampling Improves Language-Image Pretraining
Adhiraj Ghosh, Vishaal Udandarao*, Thao Nguyen*, Matteo Farina*, Mehdi Cherti, Jenia Jitsev, Sewoong Oh, Elisa Ricci, Ludwig Schmidt, Matthias Bethge.
arXiv:2511.20643, 2025
Paper
In this work, we show that concept-aware data curation and online batch sampling improves the downstream performance of contrastive vision-language models. We introduce DataConcept, 128M image-text pairs annotated with concept-centric information, and Concept-Aware Batch Sampling (CABS), a framework to use concept information to curate batches online instead of static curation.
|
|
|
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities
Adhiraj Ghosh*, Sebastian Dziadzio*, Ameya Prabhu, Vishaal Udandarao, Samuel Albanie, Matthias Bethge.
ACL 2025 (Main)
Paper
To evaluate the vast capabilities of foundation models, we introduce ONEBench – a benchmark that unifies individual test sets into a vast pool of individual data-measurement samples. We shift the focus from singular test-sets to sample-level evaluations, re-structuring static benchmarks to accommodate an ever-expanding pool of datasets and models.
|
|
|
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao*, Ameya Prabhu*, Adhiraj Ghosh, Yash Sharma, Philip H.S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge.
NeurIPS 2024
Paper /
Code /
Let It Wag! Benchmark
The impressive empirical performance of VLMs is attributed to test concepts within their pretraining datasets, thus not showcasing "zero-shot" generalization. Instead, they need exponentially more data on a concept to linearly improve performance.
|
|
|
ViPE: Visualise Pretty-much Everything
Hassan Shahmohammadi, Adhiraj Ghosh, Hendrik Lensch.
EMNLP 2023 (Outstanding Paper Award)
Paper /
Code /
Dataset /
HuggingFace /
Music Videos
ViPE is the first automated model for translating any arbitrary piece of text into a visualisable prompt. It helps any text-to-image model in figurative or non-lexical language visualisations.
|
|
|
Real World Music Object Recognition
Adhiraj Ghosh*,Lukas Tuggener*, Raphael Emberger*, Pascal Sager*, et al.
TISMIR 2023
Paper /
Code
We present solutions to improve recognition accuracy in Music Object Recognition on low-quality, real-world music sheet data and provide confidence-rated model outputs to enable efficient human post-processing.
|
|
|
Relation Preserving Triplet Mining for Stabilising the Triplet Loss in Re-identification Systems
Adhiraj Ghosh, Kuruparan Shanmugalingam, Wen-Yan Lin
WACV 2023
Paper /
Code /
Video /
Poster
We propose a new, feature-guided triplet mining scheme for understanding intrinsic pose to solve the intra-class variance problem in re-identification datasets.
|
|
|
Irony Detection in Bengali Tweets: A New Dataset, Experimentation and Results
Adhiraj Ghosh, Kamal Sarkar
ICCIDS 2020
Paper /
Dataset
This paper presents the description of the Bengali irony detection dataset developed by us and reports results obtained on our Bengali irony dataset using SOTA machine learning methodologies.
|
|