Skip to content

Enhance the numerical stability of the Cautious Optimizer#2658

Merged
rwightman merged 1 commit intohuggingface:mainfrom
Yuan-Jinghui:main
Jan 29, 2026
Merged

Enhance the numerical stability of the Cautious Optimizer#2658
rwightman merged 1 commit intohuggingface:mainfrom
Yuan-Jinghui:main

Conversation

@Yuan-Jinghui
Copy link
Copy Markdown
Contributor

Hi rwightman, this PR aims to fix potential numerical errors.

Motivation:
Applying a coordinate-wise mask in the tangent space may cause the masked tangent vector to slightly deviate from the tangent space. Here, we follow AdamP’s design philosophy by keeping the masked update strictly within the tangent space, which avoids potential numerical instability. Since
$\langle \xi, g^\bot \rangle = \langle \xi^\bot, g^\bot \rangle$,
where $\xi$ denotes the masked update, this approach always guarantees that the update direction is descending, which is consistent with the design philosophy of cautious optimizers.

Modifications:
Only one additional line of code is introduced, reusing an existing operation:
perturb -= p_n * view_func(p_n * perturb).sum(dim=1).reshape(expand_size)
This is simply an extra reprojection step, but it helps prevent numerical instability.

Enhance numerical stability of the Cautious Optimizer.
@rwightman rwightman merged commit 9171d82 into huggingface:main Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants