user avatar
Jesse Dodge
@JesseDodge
Research Scientist at Meta. 10-yr test-of-time ACL 22, Best Demo ACL 25, Best Resource Paper ACL 24, Best Theme Paper ACL 24, Best Student Paper NAACL 15 🏳️‍🌈
Joined March 2009
Posts
  • user avatar
    Today Google released Gemini with a 60-page report in which they repeatedly say the training data is key ("We find that data quality is critical to a highly-performing model"), while providing almost no information about how it was made, how it was filtered, or its contents.
  • user avatar
    GPT-3 won a best paper award at #NeurIPS2020! Congratulations to that team, it truly is an incredible piece of work, and has changed the way many of us think about what massive LMs can do. But we should also talk about inequality in the research community -- that work couldn't...
  • user avatar
    Today Google announced PaLM 2. In their 91 page paper they repeatedly say the training data is key ("we find that the data mixture is a critical component of the final model") while providing almost no information about how it was constructed, how it was sourced, or its contents.
  • user avatar
    Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping arxiv.org/abs/2002.06305 We found surprisingly large variance just from random seeds when fine-tuning BERT. Both weight inits and the order of the training data have big impact. 1/n
  • user avatar
    Personal update: I'm excited to be joining @Meta! I'm deeply grateful for the opportunities I've had at @allen_ai over the past 6 years (including three paper awards in the last two years). Onward to the next chapter! 🥳
  • user avatar
    WE WON THE ACL 10-YEAR TEST-OF-TIME AWARD!! Ten thousand congratulations to @mmitchell_ai and our co-authors Amit Goyal, @kotymg, @karlstratos, Xufeng Han, Alyssa Mensch, Alex Berg, Tamara Berg, and @haldaume3!
    The second of the #acl2022nlp 10-year test of time awards goes to @mmitchell_ai et al. for their work on generating image descriptions published at EACL 2012 #NLProc aclanthology.org/E12-1076/
  • user avatar
    Replying to @JesseDodge
    This follows the trend of white papers that are written to read like research papers which don't actually contain the necessary information for basic science. This is a product, and they are purposely obscuring the most important information that makes the models work.
  • user avatar
    How much CO2 is emitted from training common AI models? New FAccT paper! *Partially* training a 6 B. param. transformer emits about as much as the average US home in a year! Smaller models? Only as much as charging a phone. What can you do? A 🧵: arxiv.org/pdf/2206.05229…
  • user avatar
    Congrats to our team for winning two paper awards at #ACL2024! OLMo won the Best Theme Paper award, and Dolma won a Best Resource Paper award! All the credit goes to the whole team for the massive group effort 🎉🎉
  • user avatar
    Successfully defended my Ph.D. under quarantine!
  • user avatar
    Ever wonder about the web-scale data massive LMs train on? We wrote some docs for C4! cs.cmu.edu/~jessed/data_h… And we indexed it, and built an interactive demo, so you can search too: c4-search.apps.allenai.org find something cool? report it or discuss here: github.com/allenai/c4-doc…
  • user avatar
    The best way to understand large language models is to understand what they were trained on. Most pretraining datasets have *zero* documentation of their contents! We worked with @nitashatiku and the other WaPo journalists on this piece, check it out!
    Replying to @nitashatiku
    Here's our analysis of the 15 million websites in just one highly-filtered CommonCrawl web scrape-used to train models like Google's T5 & Facebook's LLaMA -copyright symbol appears >200M times -pirated sites, 1 for e-books -half the top 10 = news sites washingtonpost.com/technology/int…
  • user avatar
    Replying to @JesseDodge
    Now that LLMs are products (not just research), we are at a turning point: for-profit companies will become less and less transparent *specifically* about the components that are most important. Only if the open source community can organize together can we keep up!
  • user avatar
    could not be more proud of @MaartenSap, who just *successfully defended* one of the best PhD theses I've seen. he's already had a successful career, and he's only getting started!