BabyAlone

BabyLM Scale Models and Let-Alone Experiments (to appear at EMNLP 2025).

Installation:

This was written using Python version 3.13.5.

preprocess_babyLM.py: takes the BabyLM data and sentence segments it
filter_babyLM.py: filters pretraining data to remove relevant constructions
pretrain_babyLM.py: pretrains on specified data split
make_templates.py: makes test set templates
perplexity_eval.py: runs evaluations on template datasets
unigramlm.py: Copied from: https://github.com/kanishkamisra/aannalysis takes unigram frequencies and creates unigram lm (for SLOR)
unigrams.py: Copied from: https://github.com/kanishkamisra/aannalysis to calculate unigram frequencies for lm

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Data		Data
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
continue_pretrain.py		continue_pretrain.py
counter.txt		counter.txt
filter_babyLM.py		filter_babyLM.py
make_templates.py		make_templates.py
perplexity_eval.py		perplexity_eval.py
preprocess_babyLM.py		preprocess_babyLM.py
pretrain_babyLM.py		pretrain_babyLM.py
requirements.txt		requirements.txt
swap_target_cxn.py		swap_target_cxn.py
tokenization_utils.py		tokenization_utils.py
unigramlm.py		unigramlm.py
unigrams.py		unigrams.py