(1/n) Social science research often relies on scans of documents such as statistical tables, newspapers, firm level reports, etc. #EconTwitter
Melissa Dell
126 posts
Economics Professor @Harvard. Development economics, political economy, economic history, deep learning methods for data curation.
Cambridge, MA
Joined March 2021
- I’m excited to share American Stories, a new billion-scale dataset of structured texts/layouts from public domain newspapers (1780-1960) that we’ve built using our deep learning packages. #EconTwitter (1/13) Paper: arxiv.org/abs/2308.12477 Dataset: huggingface.co/datasets/dell-…
- Replying to @MelissaLDell(3/n) We are releasing an open-source deep-learning powered library, Layout Parser, that provides a variety of tools for automatically processing document image data at scale. Webpage: layout-parser.github.io Arxiv: arxiv.org/abs/2103.15348 Github: github.com/Layout-Parser/…
- I've posted a review article on deep learning for economists arxiv.org/pdf/2407.15339. The DL literature is vast, and I initially found it pretty daunting to find the parts of it that are most relevant to economic applications. I hope people will find this useful! #econtwitter
- I’m excited to share News Déjà Vu (newsdejavu.github.io), which uses a custom large language model to retrieve historical news articles that are the most similar to modern news articles. (1/4)
- (1/2) Knowledge base on deep learning methods for data curation is up: dell-research-harvard.github.io/blog.html Covers methods from computer vision and NLP. I found it overwhelming at first to tackle the vast DL lit, hope links to resources for getting started will be of potential use to others
- We've released a new dataset, Newswire arxiv.org/abs/2406.09490 2.7M unique newswires reproduced 32M+ times over a century (1878-1977). Articles have location and topic tags and person ids (from Wikipedia). Fun fact: see the prohibition related crime spike in the 1920s
- Interesting tool to turn research papers into AI generated podcast discussions that provide a high level overview: illuminate.google.com/home?pli=1 Fed it some of my papers and it did a very reasonable job, especially with the ML pubs. #econtwitter
- The Harvard Economics department has an opening for a tenured position in development economics: aeaweb.org/joe/listing.ph… This is a senior search, specific to development economics, that requires application through JOE. Please spread the word! #EconTwitter
- Neural networks (e.g., LLMs) make often imperfect predictions, introducing biases into analyses that rely on them. Common empirical economics scenarios fall outside the existing literature on debiasing “black-box AI”. Our paper (with @J_S_Carlson) on robust and efficient
- Introducing LinkTransformer: LT brings the advantages of AI to standard data frame manipulation tasks like merges, deduplication, and clustering, making it easy to use large language models in a standard data wrangling workflow. linktransformer.github.io #EconTwitter (1/10)
- I'm hiring summer undergrad RAs; build deep learning pipelines for econ dev/pol econ (no DL experience required). $15/hr; can be remote; US work auth. required; undergrads only. Send [email protected] CV/transcript to apply. Specify FT/PT interest. #EconTwitter
- Replying to @MelissaLDell(18/n) If Layout-Parser seems relevant to your work, please consider taking less than a minute to visit our website: layout-parser.github.io. If you are on Github, take two seconds to star our repo: github.com/Layout-Parser/…. This will help us demonstrate crucial community support.
- Replying to @MelissaLDell(15/n) No background in deep learning? I’m teaching a new course this semester on deep learning for data curation at scale. I’ll be putting the course material into a public knowledgebase. I’ll post here when this is released (sometime in the next 1-2 months).




