Skip to content

WesScivetti/BabyAlone

Repository files navigation

BabyAlone

BabyLM Scale Models and Let-Alone Experiments (to appear at EMNLP 2025).

Installation:

This was written using Python version 3.13.5.

Scripts:

  • preprocess_babyLM.py: takes the BabyLM data and sentence segments it
  • filter_babyLM.py: filters pretraining data to remove relevant constructions
  • pretrain_babyLM.py: pretrains on specified data split
  • make_templates.py: makes test set templates
  • perplexity_eval.py: runs evaluations on template datasets
  • unigramlm.py: Copied from: https://github.com/kanishkamisra/aannalysis takes unigram frequencies and creates unigram lm (for SLOR)
  • unigrams.py: Copied from: https://github.com/kanishkamisra/aannalysis to calculate unigram frequencies for lm

About

BabyLM Scale Models and Let-Alone

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages