To fix the chainability at epoch zero for some schedulers by iramazanli · Pull Request #63457 · pytorch/pytorch

iramazanli · 2021-08-18T03:12:13Z

It has been discussed in the #60836 (comment) that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed

some of the learning rate schedulers returns initial learning rates at epoch 0 as

       return self.base_lrs`

This can be a problem when two schedulers called as chained as

     scheduler1.step()
     scheduler2.step()

in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.

The following code snippet illustrates the problem better

Reproducing the bug

import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 1.0)
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = ExponentialLR(optimizer, gamma=0.9)

for epoch in range(10):
     print(epoch, scheduler2.get_last_lr()[0])
     optimizer.step()
     scheduler1.step()
     scheduler2.step()

Current Result

0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 5.904900000000001
6 5.314410000000001
7 4.782969000000001
8 4.304672100000001
9 3.874204890000001

Expected Result

0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 0.5904900000000001
6 0.5314410000000001
7 0.4782969000000001
8 0.4304672100000001
9 0.3874204890000001

Partially resolves pytorch/vision#4281

facebook-github-bot · 2021-08-18T03:12:19Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/63457
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit ce80523 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-scanned failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.2-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

codecov · 2021-08-18T19:58:31Z

Codecov Report

Merging #63457 (ce80523) into master (4a390a5) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #63457      +/-   ##
==========================================
- Coverage   75.56%   75.51%   -0.05%     
==========================================
  Files        2118     2118              
  Lines      212263   212291      +28     
==========================================
- Hits       160399   160316      -83     
- Misses      51864    51975     +111

datumbox

LGTM, thanks @iramazanli for fixing this so quickly.

I tested your patch on latest nightly with a slightly modified loop:

for epoch in range(10):
     print(epoch, scheduler2.get_lr())
     optimizer.step()
     scheduler1.step()
     scheduler2.step()

And I get the expected result:

0 [0.1]
1 [0.08100000000000002]
2 [0.07290000000000002]
3 [0.06561000000000002]
4 [0.05904900000000002]
5 [0.5314410000000002]
6 [0.47829690000000014]
7 [0.43046721000000016]
8 [0.38742048900000015]
9 [0.34867844010000015]

Which is the combined effect of both schedulers.

facebook-github-bot · 2021-08-19T11:37:10Z

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

fmassa

Great, thanks for fixing this!

iramazanli · 2021-08-19T12:32:52Z

LGTM, thanks @iramazanli for fixing this so quickly.

I tested your patch on latest nightly with a slightly modified loop:
for epoch in range(10):
     print(epoch, scheduler2.get_lr())
     optimizer.step()
     scheduler1.step()
     scheduler2.step()
And I get the expected result:
0 [0.1]
1 [0.08100000000000002]
2 [0.07290000000000002]
3 [0.06561000000000002]
4 [0.05904900000000002]
5 [0.5314410000000002]
6 [0.47829690000000014]
7 [0.43046721000000016]
8 [0.38742048900000015]
9 [0.34867844010000015]
Which is the combined effect of both schedulers.

That's amazing! lets merge this PR then :)

facebook-github-bot · 2021-08-19T14:19:18Z

@iramazanli merged this pull request in e7c4988.

facebook-github-bot added the cla signed label Aug 18, 2021

To fix the chainability at epoch zero for some schedulers

ce80523

iramazanli force-pushed the group_lr_base_lr branch from 91e1ba8 to ce80523 Compare August 18, 2021 16:17

iramazanli requested review from datumbox and fmassa August 19, 2021 04:48

datumbox approved these changes Aug 19, 2021

View reviewed changes

datumbox mentioned this pull request Aug 19, 2021

Update reference scripts to use the "Batteries Included" utils pytorch/vision#4281

Closed

4 tasks

fmassa approved these changes Aug 19, 2021

View reviewed changes

facebook-github-bot closed this in e7c4988 Aug 19, 2021

facebook-github-bot added the Merged label Aug 19, 2021

datumbox mentioned this pull request Sep 4, 2021

[RFC] TorchVision with Batteries included - Phase 1 pytorch/vision#3911

Closed

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To fix the chainability at epoch zero for some schedulers#63457

To fix the chainability at epoch zero for some schedulers#63457
iramazanli wants to merge 1 commit intopytorch:masterfrom
iramazanli:group_lr_base_lr

iramazanli commented Aug 18, 2021 •

edited by datumbox

Loading

Uh oh!

facebook-github-bot commented Aug 18, 2021 •

edited

Loading

Uh oh!

codecov bot commented Aug 18, 2021

Uh oh!

datumbox left a comment

Uh oh!

facebook-github-bot commented Aug 19, 2021

Uh oh!

fmassa left a comment

Uh oh!

iramazanli commented Aug 19, 2021

Uh oh!

facebook-github-bot commented Aug 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

iramazanli commented Aug 18, 2021 • edited by datumbox Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproducing the bug

Current Result

Expected Result

Uh oh!

facebook-github-bot commented Aug 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

codecov bot commented Aug 18, 2021

Codecov Report

Uh oh!

datumbox left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 19, 2021

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

iramazanli commented Aug 19, 2021

Uh oh!

facebook-github-bot commented Aug 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iramazanli commented Aug 18, 2021 •

edited by datumbox

Loading

facebook-github-bot commented Aug 18, 2021 •

edited

Loading