To fix the chainability at epoch zero for some schedulers#63457
To fix the chainability at epoch zero for some schedulers#63457iramazanli wants to merge 1 commit intopytorch:masterfrom
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit ce80523 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
91e1ba8 to
ce80523
Compare
Codecov Report
@@ Coverage Diff @@
## master #63457 +/- ##
==========================================
- Coverage 75.56% 75.51% -0.05%
==========================================
Files 2118 2118
Lines 212263 212291 +28
==========================================
- Hits 160399 160316 -83
- Misses 51864 51975 +111 |
datumbox
left a comment
There was a problem hiding this comment.
LGTM, thanks @iramazanli for fixing this so quickly.
I tested your patch on latest nightly with a slightly modified loop:
for epoch in range(10):
print(epoch, scheduler2.get_lr())
optimizer.step()
scheduler1.step()
scheduler2.step()And I get the expected result:
0 [0.1]
1 [0.08100000000000002]
2 [0.07290000000000002]
3 [0.06561000000000002]
4 [0.05904900000000002]
5 [0.5314410000000002]
6 [0.47829690000000014]
7 [0.43046721000000016]
8 [0.38742048900000015]
9 [0.34867844010000015]
Which is the combined effect of both schedulers.
|
@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
fmassa
left a comment
There was a problem hiding this comment.
Great, thanks for fixing this!
That's amazing! lets merge this PR then :) |
|
@iramazanli merged this pull request in e7c4988. |
It has been discussed in the #60836 (comment) that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed
in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.
The following code snippet illustrates the problem better
Reproducing the bug
Current Result
Expected Result
Partially resolves pytorch/vision#4281