Enhance the numerical stability of the Cautious Optimizer by Yuan-Jinghui · Pull Request #2658 · huggingface/pytorch-image-models

Yuan-Jinghui · 2026-01-29T07:42:15Z

Hi rwightman, this PR aims to fix potential numerical errors.

Motivation:
Applying a coordinate-wise mask in the tangent space may cause the masked tangent vector to slightly deviate from the tangent space. Here, we follow AdamP’s design philosophy by keeping the masked update strictly within the tangent space, which avoids potential numerical instability. Since
$\langle \xi, g^\bot \rangle = \langle \xi^\bot, g^\bot \rangle$,
where $\xi$ denotes the masked update, this approach always guarantees that the update direction is descending, which is consistent with the design philosophy of cautious optimizers.

Modifications:
Only one additional line of code is introduced, reusing an existing operation:
perturb -= p_n * view_func(p_n * perturb).sum(dim=1).reshape(expand_size)
This is simply an extra reprojection step, but it helps prevent numerical instability.

Enhance numerical stability of the Cautious Optimizer.

Enhance the numerical stability of the Cautious Optimizer

20e2f45

Enhance numerical stability of the Cautious Optimizer.

rwightman merged commit 9171d82 into huggingface:main Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance the numerical stability of the Cautious Optimizer#2658

Enhance the numerical stability of the Cautious Optimizer#2658
rwightman merged 1 commit intohuggingface:mainfrom
Yuan-Jinghui:main

Yuan-Jinghui commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Yuan-Jinghui commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants