Log inSign up
Fabian Schaipp
532 posts
user avatar
Fabian Schaipp
@FSchaipp
working on optimization for machine learning. currently postdoc @inria_paris.
Paris, France
fabian-sp.github.io
Joined July 2020
759
Following
1,285
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Fabian Schaipp
    @FSchaipp
    Feb 5, 2025
    Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 arxiv.org/abs/2501.18965
    user avatar
    Aaron Defazio
    @aaron_defazio
    Feb 3, 2025
    The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting! 1/2
    32K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Aug 6, 2024
    Personal news: I successfully defended my PhD. Next stop: I will start a postdoc with @umutsimsekli, @TaylorAdrien , and @BachFrancis at Inria Paris
    23K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Feb 19, 2024
    New blog post : AdamW is often considered to "decouple learning rate and weight decay". But this is not entirely true if you use AdamW from Pytorch. 🗞️ Full post: fabian-sp.github.io/posts/2024/02/…
    29K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Jul 17, 2024
    I will be at a big ML conference for the first time next week! #ICML2024 Hit me up if you want to chat about optimization, or join for a chill evening run ☕🏃
    24K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Feb 4, 2024
    (Possibly) dumb question: are people in ML aware that AdamW *in #PyTorch* does not actually fully decouple learning rate and weight decay?
    40K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Dec 8, 2023
    Replying to @francoisfleuret
    x = x + tau/max(tau,|z-x|) * (z-x) where z is the latest sample. This is stochastic proximal point for the median problem (tau is a step size). openreview.net/pdf?id=C6PiH9F…
    7K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Jun 13, 2023
    A recent trend in optimization for ML seems to be designing algorithms that need less learning rate tuning or are entirely hyper-parameter free. 🎛️ This is a sketched overview of four of those methods (3 of 4 came out in 2023).
    17K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Mar 29, 2022
    Bring a little new colours into your #python plots other than the #matplotlib defaults - here are some tools I use quite a lot:
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Oct 15, 2023
    An observation on training transformers with adaptive learning rates It is known that Adam is better for training transformers than SGD-M (momentum). This is also true for the simple example of training a ViT on CIFAR10. Left: final train loss over a grid of learning rates.
    26K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Jan 5, 2024
    I recently created one single bib file with all papers from ICML, ICLR and NeurIPS. No more browsing through the labyrinth of conference websites. This also lets you do some "data science". Number of papers with a certain keyword in title.
    12K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    May 2, 2024
    Two pleasant news 🌀 (i) Our paper on adaptive learning rates for momentum methods (called MoMo) got accepted at #ICML2024 (ii) MoMo is now available in @GoogleDeepMind 's optimization library optax.readthedocs.io/en/latest/api/…
    8.8K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    May 20, 2024
    Together with @phschiele1 , we wrote a package to solve constrained optimization problems, where all functions are arbitrary @PyTorch modules. This is mainly intended for optimization with pre-trained NNs as objective/constraints.
    GitHub - fabian-sp/ncOPT: Constrained optimization for Pytorch using the SQP-GS algorithm
    From github.com
    7.2K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    May 8, 2023
    Polyak's step size for SGD sparked lots of interest recently because it needs less tuning. One open question was how to handle regularization, as the Polyak step size involves objective function values. We gave an answer to this in our paper, which is now published at TMLR.
    user avatar
    Accepted papers at TMLR
    @TmlrPub
    May 3, 2023
    A Stochastic Proximal Polyak Step Size Fabian Schaipp, Robert M. Gower, Michael Ulbrich. Action editor: Stephen Becker. openreview.net/forum?id=jWr41… #regularization #proxsps #regularizer
    4.7K
  • user avatar
    Fabian Schaipp
    @FSchaipp
    Nov 15, 2023
    Looking for a thesis title. Current fav: Harder, Better, Faster, Stronger Optimization
    3.9K