Added option to update parameters using state_dict in AveragedModel (#65495)#65755
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 109d701 (more details on the Dr. CI page):
🕵️ 15 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Build | 🔁 rerun | |
| Build | 🔁 rerun | |
| Build | 🔁 rerun | |
| Download Docker image | 🔁 rerun | |
| pytorch android gradle custom build single architecture (for PR) | 🔁 rerun | |
| Build | 🔁 rerun | |
| Set Up CI Environment After attach_workspace | 🔁 rerun | |
| pytorch android gradle custom build single architecture (for PR) | 🔁 rerun | |
| Download Docker image | 🔁 rerun | |
| Build | 🔁 rerun | |
| Unknown | 🔁 rerun | |
| Download Docker image | 🔁 rerun | |
| Build | 🔁 rerun | |
| Build | 🔁 rerun | |
| Unknown | 🔁 rerun | |
| Download Docker image | 🔁 rerun |
❄️ 1 failure tentatively classified as flaky
but reruns have not yet been triggered to confirm:
linux-xenial-py3.6-gcc5.4 / build-docs (cpp) (1/1)
Step: "Unknown" (full log | diagnosis details | 🔁 rerun) ❄️
2021-10-05T18:04:07.6845477Z E: Failed to fetch...: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-10-05T18:04:07.6411056Z
2021-10-05T18:04:07.6411376Z Reading package lists... 99%
2021-10-05T18:04:07.6411619Z
2021-10-05T18:04:07.6606815Z Reading package lists... 99%
2021-10-05T18:04:07.6607176Z
2021-10-05T18:04:07.6607484Z Reading package lists... Done
2021-10-05T18:04:07.6607736Z
2021-10-05T18:04:07.6842167Z W: The repository 'https://deb.nodesource.com/node_12.x xenial Release' does not have a Release file.
2021-10-05T18:04:07.6843271Z N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
2021-10-05T18:04:07.6844174Z N: See apt-secure(8) manpage for repository creation and user configuration details.
2021-10-05T18:04:07.6845477Z E: Failed to fetch https://deb.nodesource.com/node_12.x/dists/xenial/main/source/Sources server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
2021-10-05T18:04:07.6846576Z E: Some index files failed to download. They have been ignored, or old ones used instead.
2021-10-05T18:04:07.6924225Z
2021-10-05T18:04:07.7363626Z Reading package lists... 0%
2021-10-05T18:04:07.7363982Z
2021-10-05T18:04:07.7470337Z Reading package lists... 0%
2021-10-05T18:04:07.7470595Z
2021-10-05T18:04:07.7941201Z Reading package lists... 1%
2021-10-05T18:04:07.7941490Z
2021-10-05T18:04:07.7941789Z Reading package lists... 8%
2021-10-05T18:04:07.7942088Z
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
zhouzhuojie
left a comment
There was a problem hiding this comment.
lgtm, waiting for all the signals and will merge after that
|
Looks like there's a change we need to fix in the release branch, @prabhat00155 can you try rebasing after this commit? #65787 cc @malfet |
16a4dff to
572cd71
Compare
|
@zhouzhuojie I see a couple of unrelated failures. Is there something preventing the merge? |
|
How can we unblock this? This needs to be followed up by cherrypicking PR #65921 which contains minor doc corrections and refactorings. |
|
@zhouzhuojie Sorry for the multiple pings, just being conscious about the deadline for merging this in the release branch. Could you let us know if there is anything we can do on our side? |
…ytorch#65495) Summary: While implementing [EMA](pytorch/vision#4381 extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](pytorch/vision#4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: pytorch/vision#4406 (review) Pull Request resolved: pytorch#65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2 (cherry picked from commit 2ea724b)
572cd71 to
109d701
Compare
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slowFor more information, please take a look at the CI Flow Wiki. |
Summary: Discussion: pytorch#65495 (comment) Pull Request resolved: pytorch#65921 Reviewed By: albanD Differential Revision: D31310105 Pulled By: prabhat00155 fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3 (cherry picked from commit c7748fc)
Codecov Report
@@ Coverage Diff @@
## release/1.10 #65755 +/- ##
=================================================
+ Coverage 20.20% 66.32% +46.11%
=================================================
Files 23 738 +715
Lines 5232 94259 +89027
=================================================
+ Hits 1057 62515 +61458
- Misses 4175 31744 +27569 |
|
Thanks a lot for your help merging this. :) |
|
Sorry to jump in here, but I don't understand why this satisfies the cherry-pick criteria. First off, @malfet we should clarify how the criteria should be specified. In #65438, it says:
The criteria for this one is given as:
which isn't informative. We should update the description to something like:
In this particular case, from my quick reading it seems like:
|
|
@gchanan Thanks for the feedback. Let me provide a summary of the what/why so that we can decide easier next steps:
Let us know how you would like us to proceed. Would you rather rollback the PR on the release branch, leave as is or bring a quick PR that addresses your concerns around the |
|
@gchanan good point, will update template for 1.10 and use this verbiage going forward. |
|
my preference would be that we improve the |
|
@gchanan Sounds good. Just to confirm, your preference is to:
If that's the case that's OK with me. @malfet If the above is confirmed, who is responsible for reverting the PR? You or us? |
Summary:
While implementing EMA(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in EMA class. This PR aims to handle this scenario removing the need for this custom update_parameters() implementation.
Discussion: pytorch/vision#4406 (review)
Pull Request resolved: #65495
Reviewed By: datumbox
Differential Revision: D31176742
Pulled By: prabhat00155
fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2
(cherry picked from commit 2ea724b)
Fixes #{issue number}