LingGym is a new benchmark that evaluates LLMs’ capacity for metalinguistic reasoning using Interlinear Glossed Text (IGT) and grammatical descriptions extracted from 18 typologically diverse reference grammars. Our work is presented in LingGym: How Far Are LLMs from Thinking Like Field Linguists?
In this github repo. We release three types of datasets:
-
Benchmark data (Main): the complete multiple choice dataset used for benchmark evaluation.
-
CSV files: extracted examples and explanations from grammar books.
-
IGT files: all IGT-format data extracted from grammar books.
The benchmark data is also available in Hugging Face: LINK
If you find our work useful, please consider citing our paper.
@inproceedings{yang-etal-2025-linggym,
title = "{L}ing{G}ym: How Far Are {LLM}s from Thinking Like Field Linguists?",
author = "Yang, Changbing and
Ma, Franklin and
Shi, Freda and
Zhu, Jian",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.69/",
doi = "10.18653/v1/2025.emnlp-main.69",
pages = "1314--1340",
ISBN = "979-8-89176-332-6"
}