user avatar
Tom Zahavy
@TZahavy
Building creative agents @GoogleDeepMind. AlphaProof, AlphaZero_db, PuzzleGen, Convex RL, meta gradients. Staff research scientist, discovery team
London, England
Joined December 2018
Posts
  • Pinned
    user avatar
    I am excited to share a work we did in the Discovery team at @GoogleDeepMind using RL and generative models to discover creative chess puzzles 🔊♟️♟️ #neurips2025 🎨While strong chess players intuitively recognize the beauty of a position, articulating the precise elements that
  • user avatar
    I am looking to hire a student researcher to work with AlphaProof on a project at the intersection of AI, math, computation, and creativity. Background in AI for math, and/or Lean is desired. If interested, please get in touch. The position will be based in London.
  • user avatar
    I'm super excited to share AlphaZeroᵈᵇ, a team of diverse #AlphaZero agents that collaborate to solve #Chess puzzles and demonstrate increased creativity. Check out our paper to learn more! arxiv.org/abs/2308.09175 A quick 🧵(1/n)
  • user avatar
    Excited to announce our recent @GoogleDeepMind paper, AlphaProof, out in @Nature today! It has been over a year since AlphaProof achieved silver-medal standard solving International Mathematical Olympiad (IMO) problems, by teaching itself mathematics in LEAN (@leanprover).
  • user avatar
    We are looking for brilliant and creative candidates with strong programming skills to join us at the Discovery team at @GoogleDeepMind 🧙 We build AI agents that discover new knowledge using RL, planning and LLMs. DM me if you have specific questions about working with us 🙏
  • user avatar
    We are looking for brilliant and creative candidates with strong programming skills to join us at the Discovery team at @GoogleDeepMind 🧙 We are building AI agents that create new knowledge using RL, planning and LLMs in domains like Mathematics, chess and more. Please apply
  • user avatar
    In our #Neurips2021 spotlight, we study RL problems where the goal is to minimize a cost over the state occupancy. When this cost is linear, we get the standard RL problem. When it is non-linear, we get apprenticeship learning, pure exploration, diversity and more. [1/7]
  • user avatar
    Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: arxiv.org/pdf/2205.13521… DOMiNO's🍕 in Walker walk:
    00:00
  • user avatar
    Super excited to share that our Bootstrapped Meta Learning paper led by @flennerhag received an Outstanding Paper Award from #iclr2022 Better meta learning -> doubled the performance of STACX in Atari to a new SOTA. Come talk with us at the poster session!blog.iclr.cc/2022/04/20/ann…
    What should a meta-learner optimize? What if we make it chase its own future outputs? Turns out, it can improve meta-optimization, set new SOTAs, and lead to new types of meta-learning. arxiv.org/pdf/2109.04504… w. Y. Schroecker, @tomzhavy, @hado, D. Silver, S. Singh. 🧵👇
  • user avatar
    We are hiring students in the discovery team. If you are interested in creativity and RL, consider applying ❤️
    📣 Hiring Alert: Student Researcher - 2026 @vivek_veeriah and I are looking for a PhD Student Researcher to join the GDM Discovery team in London 🇬🇧! We will be investigating how creativity in LLMs generalizes, with application to scientific discovery 🔭 Apply below! ⬇️
  • user avatar
    Very excited to share AlphaProof, an agent that self-taught itself Mathematics in Lean and achieved a silver-medal standard in the International Math Olympiad 🥈🥈🥈🥈 @leanprover is a functional programming language for formal Mathematics and a theorem prover. It enables you to
    We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 dpmd.ai/imo-silver
    GIF
  • user avatar
    A rejection story with a happy end. A paper from my #Phd was accepted to #ICML2021 after 4-5 rejections (I lost count honestly). Each time we had reviewers that liked it and some that didn’t. Believing in it and keep improving it over time eventually got it in. Don’t loose hope!
  • user avatar
    Replying to @TZahavy
    Read more about it: ♟️ @chesscom blogpost: chess.com/news/view/ai-l… 💻Booklet & Review: arxiv.org/abs/2510.23772 📃Paper: arxiv.org/abs/2510.23881
  • user avatar
    Late on arXiv (oral @CoLLAs_Conf): @jelennal_ who did a fantastic internship with us at the Discovery team @DeepMind studies how adding context to meta gradients can help agents to adapt when the environment changes. Thanks for sharing @_akhaliq
    Meta-Gradients in Non-Stationary Environments abs: arxiv.org/abs/2209.06159