Remove our AdamW implementation#36177
Conversation
4da47cc to
6338b0c
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Ready for core maintainer review @ArthurZucker @Cyrilvallez |
src/transformers/optimization.py
Outdated
|
|
||
|
|
||
| class AdamW(Optimizer): | ||
| class AdamW(TorchAdamW): |
There was a problem hiding this comment.
as long as you are sure all paramrs are the same for init, let'sgo! 🤗
There was a problem hiding this comment.
we can also deprecate this one?
There was a problem hiding this comment.
Good point, actually - the old one was already deprecated! Maybe we should just remove it entirely, since we've been showing a deprecation warning for a long time now?
90dabf5 to
d99ad13
Compare
|
cc @ArthurZucker I cut it out entirely (it was raising a deprecation warning every time it was being used anyway). I refactored the references to it to use |
|
cc @ArthurZucker @Cyrilvallez I think this should be ready to go, but I'd like a core maintainer approval first! The plan is to just totally remove our AdamW class and redirect all the legacy references to it to use |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Hey @Rocketknight1! Thanks for cleaning up! 🙏 LGTM except that we should not keep both adanw_hf and adamw_torch as if they were 2 separate optimizers!
src/transformers/trainer.py
Outdated
| elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED]: | ||
| elif args.optim in [OptimizerNames.ADAMW_TORCH, OptimizerNames.ADAMW_TORCH_FUSED, OptimizerNames.ADAMW_HF]: |
There was a problem hiding this comment.
IMO it is very confusing to keep adamw_hf and adamw_torch if they are now the same. Since our version of adamw was raising warnings for a long-time, I think that it should be safe to remove adamw_hf entirely (let's remove it from the docstrings as well to avoid any confusion -- it is present in 2 docstrings).
Of course the best would be to remove the torch part of all optimizers in OptimizerNames as they are now all torch, but that would be breaking. But maybe something to do in a separate PR: change the names and do a whole deprecation cycle for them.
There was a problem hiding this comment.
Done! There should be no more references to adamw_hf in the code.
2d582e7 to
8caee1d
Compare
8caee1d to
9913bff
Compare
Cyrilvallez
left a comment
There was a problem hiding this comment.
All right, LGTM! Thanks a lot!
* Just import torch AdamW instead * Update docs too * Make AdamW undocumented * make fixup * Add a basic wrapper class * Add it back to the docs * Just remove AdamW entirely * Remove some AdamW references * Drop AdamW from the public init * make fix-copies * Cleanup some references * make fixup * Delete lots of transformers.AdamW references * Remove extra references to adamw_hf
Transformers added an
AdamWimplementation before Torch supported it. However, Torch supports it now so there's not really much point in maintaining our own version!This PR deletes our
AdamWclass, but importstorch.optim.AdamWin the same file, to ensure that imports that depended on it still work.Fixes #35504