dativelm

Code and data for the COLM 2025 paper Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models.

For experiments and analyses, see the analysis folder.

All datasets and models are available on huggingface:

Model path: qing-yao/{x}_seed-{21,42,63}_{1e-3}
Dataset path: datasets/qing-yao/datives-{x}

where x∈{strict_default, loose_default, strict_balanced, loose_balanced, swapped-datives, no-datives, no-2postverbal, short-first, random-first, long-first, long-first-headfinal}.

The models can be retrained from the datasets with

bash scripts/train_autoreg.sh DATASET BASE_MODEL MODEL_NAME LR SEED EPOCHS

Make sure to modify scripts/train_autoreg.sh to specify GPU and huggingface token.

To detect datives and generate training sets from scratch:

Download BabyLM corpus without QED subtitles with

bash scripts/get_babylm.sh

Detect datives, nondatives, nonditransitives using

bash scripts/detect_datives.sh

Generate length manipulated versions of nonditransitives using

bash scripts/length_manipulations.sh

Create unattested alternants to detected datives with

bash scripts/create_alternants.sh data/train/datives

Write training sets for each model with

bash scripts/write_train.sh

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
analysis		analysis
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
requirements_training.txt		requirements_training.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dativelm

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dativelm

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages