add stats that can only be collected at runtime by zhaojuanmao · Pull Request #51386 · pytorch/pytorch

zhaojuanmao · 2021-01-29T23:09:03Z

Stack from ghstack:

log newly added construction and runtime stats at randomly selected iterations #51394 log newly added construction and runtime stats at randomly selected iterations
add stats that can only be collected at runtime #51386 add stats that can only be collected at runtime

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data

gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices
use at::cuda::event API for safer calls
events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results.
users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls

Differential Revision: D26158645

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

facebook-github-bot · 2021-01-29T23:09:10Z

💊 CI failures summary and remediations

As of commit f429313 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) ghstack-source-id: 120695335 Pull Request resolved: #51386

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data ghstack-source-id: 120818571 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/)

zhaojuanmao · 2021-02-02T06:10:15Z

right now all the timers work for CPU only, will add GPU timers

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data ghstack-source-id: 121022194 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/)

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data ghstack-source-id: 121100800 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/)

zhaojuanmao · 2021-02-12T19:58:56Z

although current diff did not result in perf regression in our benchmarks, but for better control performance just in case some application needs it in the future, I will add sampling for event recording in this diff.

zhaojuanmao · 2021-02-12T20:03:43Z

@ilia-cher and @ngimel, would you please kindly help reviewing whether CUDA time measurement parts are written properly (any pitfall I did not realize)? thanks a lot!

rohan-varma

Looks good from my side! Thank you for working through all of the different timing issues!

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

zhaojuanmao · 2021-02-16T23:18:18Z

added sampling control for event recording

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121829631 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D26158645/)!

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D26158645/)!

facebook-github-bot · 2021-02-19T08:15:25Z

This pull request has been merged in c75fa39.

Summary: Pull Request resolved: pytorch#51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178

add stats that can only be collected at runtime

05d3dce

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

zhaojuanmao requested review from mingzhe09088, mrshenli, pritamdamania87 and rohan-varma as code owners January 29, 2021 23:09

This was referenced Jan 29, 2021

add a c++ interface in processGroup to get its backend name #51066

Closed

Create PyTorch DDP logging APIs for applications to use #50637

Closed

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jan 29, 2021

zhaojuanmao mentioned this pull request Jan 29, 2021

add more logging fields that can be set in construction time #51260

Closed

wayi1 reviewed Jan 30, 2021

View reviewed changes

Comment thread torch/lib/c10d/reducer.cpp Outdated

wayi1 reviewed Jan 30, 2021

View reviewed changes

Comment thread torch/lib/c10d/reducer.cpp Outdated

zhaojuanmao mentioned this pull request Jan 30, 2021

log newly added construction and runtime stats at randomly selected iterations #51394

Closed

Update on "add stats that can only be collected at runtime"

2a220db

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

wayi1 approved these changes Feb 2, 2021

View reviewed changes

Update on "add stats that can only be collected at runtime"

f080703

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

Update on "add stats that can only be collected at runtime"

4be82bf

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]

Update on "add stats that can only be collected at runtime"

7bd5f21

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]