Schedule-free optimizers (x.com/aaron_defazio/…) are surreal.
I've read the paper, looked into the math, and tried to understand what's happening. It all seems like an incremental improvement at best (like LaProp (arxiv.org/abs/2002.04839) or Adam-Atan2
Researcher























