Life update: in June I successfully defended my PhD dissertation "Towards a Theory and Practice of Open-Ended Reasoning with Generative Models".
Now, I'm so excited to be joining Google DeepMind to continue exploring the limits of AI discovery and creativity.
Alex Havrilla
251 posts
Joined August 2021
- Excited to announce I am a recipient of @StabilityAI PhD fellowship! Excited to continue working on open source machine learning research with their support.
- 🚨🚨🚨Paper #2 from my time at Meta! In this work, we set out to understand how different algorithms fare at improving LLM reasoning from feedback. We compare expert iteration, PPO, and return-conditioned RL using Llama-2 as the base model.
- Excited to announce the final paper of my PhD!📢 A crucial piece of SFT/RL training is the availability of high-quality problem-solution data (Q, A). But what to do for difficult tasks where such data is scarce/hard to generate with SOTA models? Read on to find out
- In my humble opinion the recent Stream of Search paper (arxiv.org/abs/2404.03683) is truly outstanding. Everyone should give it a thorough read.
- How important is the quality, diversity, and complexity (QDC) of synthetic data for LLM performance? What effect does QDC data composition have on self-improvement? We just released a comprehensive survey discussing these questions (and many more) 🧵
- New paper alert🚨🚨🚨 How to bootstrap the reasoning refinement capabilities of LLMs using synthetic data? Introducing "GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements". Applied on GSM8K we can improve a strong RL finetuned LLama-2 13B by 12%
- I'll be at this year's Neurips presenting my new work!: Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
- I'm at ICML presenting GLoRe (arxiv.org/abs/2402.10963) and Teaching Reasoning with RL (arxiv.org/abs/2403.04642)! If you'd like to chat about synthetic data, process-based rewards, open-endedness, or theoretical foundations of scaling laws (or anything else) my DMs are open!
- New paper! What should you do when high resolution data is not available for training? Introducing Dual Fourier UNet (DFU): scale-robust diffusion model for zero-shot super-resolution image generation arxiv.org/abs/2401.06144
- Had a great time during our discussion, thanks again for having me!Today we're joined by @Dahoas1 from @GeorgiaTech to discuss the reasoning capability of language models and the potential to improve it with traditional RL methods 🎧 / 🎥 Listen to the episode at: twimlai.com/go/680. 📖 CHAPTERS 00:00 - Introduction 02:19 - RL vs RLHF
00:00 - Replying to @Dahoas1I grew a lot as a researcher over the last four years. So many thanks to my amazing advisor, internship mentors, and everyone else I met along the way!










