Remove our AdamW implementation by Rocketknight1 · Pull Request #36177 · huggingface/transformers

Rocketknight1 · 2025-02-13T15:50:49Z

Transformers added an AdamW implementation before Torch supported it. However, Torch supports it now so there's not really much point in maintaining our own version!

This PR deletes our AdamW class, but imports torch.optim.AdamW in the same file, to ensure that imports that depended on it still work.

Fixes #35504

HuggingFaceDocBuilderDev · 2025-02-13T18:45:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2025-02-13T20:15:56Z

Ready for core maintainer review @ArthurZucker @Cyrilvallez

ArthurZucker · 2025-02-14T12:33:25Z

src/transformers/optimization.py



-class AdamW(Optimizer):
+class AdamW(TorchAdamW):


as long as you are sure all paramrs are the same for init, let'sgo! 🤗

we can also deprecate this one?

Good point, actually - the old one was already deprecated! Maybe we should just remove it entirely, since we've been showing a deprecation warning for a long time now?

Rocketknight1 · 2025-02-20T19:53:20Z

cc @ArthurZucker I cut it out entirely (it was raising a deprecation warning every time it was being used anyway). I refactored the references to it to use torch.optim.AdamW instead.

Rocketknight1 · 2025-03-10T15:19:02Z

cc @ArthurZucker @Cyrilvallez I think this should be ready to go, but I'd like a core maintainer approval first!

The plan is to just totally remove our AdamW class and redirect all the legacy references to it to use torch.optim.AdamW instead. The class has already been throwing a deprecation warning for some time, so I think people have had enough notice by now.

Cyrilvallez

Hey @Rocketknight1! Thanks for cleaning up! 🙏 LGTM except that we should not keep both adanw_hf and adamw_torch as if they were 2 separate optimizers!

Cyrilvallez · 2025-03-11T11:34:14Z

src/transformers/trainer.py

-        elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED]:
+        elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED, OptimizerNames.ADAMW_HF]:


IMO it is very confusing to keep adamw_hf and adamw_torch if they are now the same. Since our version of adamw was raising warnings for a long-time, I think that it should be safe to remove adamw_hf entirely (let's remove it from the docstrings as well to avoid any confusion -- it is present in 2 docstrings).

Of course the best would be to remove the torch part of all optimizers in OptimizerNames as they are now all torch, but that would be breaking. But maybe something to do in a separate PR: change the names and do a whole deprecation cycle for them.

Done! There should be no more references to adamw_hf in the code.

Cyrilvallez

All right, LGTM! Thanks a lot!

* Just import torch AdamW instead * Update docs too * Make AdamW undocumented * make fixup * Add a basic wrapper class * Add it back to the docs * Just remove AdamW entirely * Remove some AdamW references * Drop AdamW from the public init * make fix-copies * Cleanup some references * make fixup * Delete lots of transformers.AdamW references * Remove extra references to adamw_hf

Rocketknight1 mentioned this pull request Feb 13, 2025

Subtle difference with Pytorch AdamW? #35504

Closed

Rocketknight1 force-pushed the no-more-adamw branch from 4da47cc to 6338b0c Compare February 13, 2025 18:19

ArthurZucker reviewed Feb 14, 2025

View reviewed changes

Rocketknight1 force-pushed the no-more-adamw branch 2 times, most recently from 90dabf5 to d99ad13 Compare February 20, 2025 19:36

Cyrilvallez reviewed Mar 11, 2025

View reviewed changes

Rocketknight1 force-pushed the no-more-adamw branch 4 times, most recently from 2d582e7 to 8caee1d Compare March 14, 2025 15:31

Rocketknight1 added 14 commits March 14, 2025 17:49

Just import torch AdamW instead

df1bed7

Update docs too

4df8d89

Make AdamW undocumented

940255f

make fixup

2087528

Add a basic wrapper class

8dea751

Add it back to the docs

b9c5749

Just remove AdamW entirely

12552cd

Remove some AdamW references

6fbc4ba

Drop AdamW from the public init

d300812

make fix-copies

698c2ca

Cleanup some references

0ef05cb

make fixup

dcc463a

Delete lots of transformers.AdamW references

ee87ce2

Remove extra references to adamw_hf

9913bff

Rocketknight1 force-pushed the no-more-adamw branch from 8caee1d to 9913bff Compare March 14, 2025 17:49

Cyrilvallez approved these changes Mar 19, 2025

View reviewed changes

Rocketknight1 merged commit 9be4728 into main Mar 19, 2025
24 checks passed

Rocketknight1 deleted the no-more-adamw branch March 19, 2025 18:29

Rocketknight1 changed the title ~~Just import torch AdamW instead~~ Remove our AdamW implementation Mar 19, 2025

dipta007 mentioned this pull request Mar 23, 2025

fix: AdamW import error stanford-futuredata/ColBERT#390

Merged

MinuraPunchihewa mentioned this pull request Mar 24, 2025

Fixed the AdamW Import Error mindsdb/lightwood#1256

Merged

mart-r mentioned this pull request Mar 24, 2025

CU-8698ek477: Fix AdamW import from tranformers to torch CogStack/MedCAT#523

Merged

coderabbitai bot mentioned this pull request Mar 24, 2026

LCORE-1440: Build CUDA GPU and arm64 versions of rag-content in Konflux lightspeed-core/rag-content#107

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove our AdamW implementation#36177

Remove our AdamW implementation#36177
Rocketknight1 merged 14 commits intomainfrom
no-more-adamw

Rocketknight1 commented Feb 13, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 13, 2025

Uh oh!

Rocketknight1 commented Feb 13, 2025

Uh oh!

ArthurZucker Feb 14, 2025

Uh oh!

ArthurZucker Feb 14, 2025

Uh oh!

Rocketknight1 Feb 14, 2025

Uh oh!

Rocketknight1 commented Feb 20, 2025

Uh oh!

Rocketknight1 commented Mar 10, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Cyrilvallez Mar 11, 2025

Uh oh!

Rocketknight1 Mar 11, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED]:
		elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED, OptimizerNames.ADAMW_HF]:

Conversation

Rocketknight1 commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 13, 2025

Uh oh!

Rocketknight1 commented Feb 13, 2025

Uh oh!

ArthurZucker Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented Feb 20, 2025

Uh oh!

Rocketknight1 commented Mar 10, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rocketknight1 commented Feb 13, 2025 •

edited

Loading