Log inSign up
Robert M. Gower
572 posts
user avatar
Robert M. Gower
@gowerrobert
Often found scribbling down math with intermittent bursts of bashing out code.
New York City, USA
gowerrobert.github.io
Joined June 2011
348
Following
1,675
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Robert M. Gower
    @gowerrobert
    Nov 19, 2024
    Do you want to do a Postdoc developing new methods/theory in Optimization for deep learning/ML? Do you enjoy bluesky open research and discussions on black boards? Then Apply to the Flatiron Fellowship in the Center of Computational Mathematics simonsfoundation.org/flatiron/caree… 1/3
    5.6K
  • user avatar
    Robert M. Gower
    @gowerrobert
    May 1, 2025
    Today I had my paper rejected by ICML. I don’t like to complain about the free voluntary work that the AC and reviewers do, but this was one of the most careless reviews and AC we ve had. Our rejection was based on this:
    80K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Jun 11, 2023
    Question for optimizers. When minimizing a positive function f(x), we can min the logarithm of f(x) instead. Applying gradient descent gives a scale invariant method. Do we have a name for this? We could apply a mirror descent method instead, but I like this for its invariance.
    64K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Jun 5, 2025
    Adam(W) is great at optimizing LLMs and most neural networks, but it's still not well understood. Optimizers try to explain Adam from their perspective of 1st and 2nd order methods. But maybe Adam has a more statistical motivation? Let me show you a mean/variance view ..1/x
    26K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Jun 4, 2025
    Are you interested in the new Muon/Scion/Gluon method for training LLMs? To run Muon, you need to approximate the matrix sign (or polar factor) of the momentum matrix. We've developed an optimal method *The PolarExpress* just for this! If you're interested, climb aboard 1/x
    24K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Mar 15, 2024
    New optimization paper alert! "Directional Smoothness and Gradient Methods: Convergence and Adaptivity". We develop new sub-optimality bounds for gradient descent that depend on the local smoothness (curvature) along the optimization path.
    23K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Mar 4, 2023
    Quick question on lifting optimization problems into a a measure space. What do you call the approach that uses a multivariate normal to approximate the measure ?
    35K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Dec 7, 2024
    Fitting a multimodal target distribution with variational inference is hard. The fit and sampling often needs costly iterative methods. Check out our spotlight NeurIPS paper where we use polynomials, score matching, and a direct solution for the fit. 1/x openreview.net/forum?id=ad97I…
    9.3K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Mar 23, 2022
    Getting second order methods to work in finite-sum optimization has been tough. The sum-of-terms structure doesn't really fit. You could apply Newton's to the stationarity, but then you'd need full batch or concentration results with minibatchs. What to do? (AISTATS 2022 ) 1/5:
  • user avatar
    Robert M. Gower
    @gowerrobert
    Mar 23, 2022
    Ever wanted to know how to analyze Policy gradient methods? Got lost in the literature with all the different assumptions on the policy space, on the MDPR, and reward function? Then I have a paper for you (AISTATS 2022) 1/3
  • user avatar
    Robert M. Gower
    @gowerrobert
    Oct 6, 2023
    I need some optimization experts! What do you call this method below? I want to call, stochastic proximal point with iterate averaging. Why am I interested in this? Because in the convex case, I think this should give good rates on the last iterate, as opposed to averages.
    34K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Mar 13, 2024
    When training a DNN model (specially LLMs) we often clip the gradients. Why is that? Something to do with exploding gradients? We @FSchaipp @GuillaumeG_ @umutsimsekli have a story to tell on how clipping is computing the geometric median of gradients. arxiv.org/pdf/2402.12828…
    13K
  • user avatar
    Robert M. Gower
    @gowerrobert
    Oct 11, 2021
    Life update: Starting on November 1st, I will be a Research Scientist at the Flatiron institute: simonsfoundation.org/flatiron/cente…, a research division developing applied mathematics to advance our understanding of science! If you want to work with me there are internships and fellowships
    Center for Computational Mathematics
    From simonsfoundation.org
  • user avatar
    Robert M. Gower
    @gowerrobert
    Nov 2, 2022
    Looking for a Research Scientist position in the heart of Manhattan, where you can lead your own research, and collaborate with amazing colleagues to crack long standing problems in the sciences? Then you should apply: simonsfoundation.wd1.myworkdayjobs.com/simonsfoundati… 1/