add stats that can only be collected at runtime#51386
add stats that can only be collected at runtime#51386zhaojuanmao wants to merge 14 commits intogh/zhaojuanmao/59/basefrom
Conversation
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit f429313 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) ghstack-source-id: 120695335 Pull Request resolved: #51386
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data ghstack-source-id: 120818571 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/)
|
right now all the timers work for CPU only, will add GPU timers |
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data ghstack-source-id: 121022194 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/)
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data ghstack-source-id: 121100800 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/)
|
although current diff did not result in perf regression in our benchmarks, but for better control performance just in case some application needs it in the future, I will add sampling for event recording in this diff. |
|
@ilia-cher and @ngimel, would you please kindly help reviewing whether CUDA time measurement parts are written properly (any pitfall I did not realize)? thanks a lot! |
rohan-varma
left a comment
There was a problem hiding this comment.
Looks good from my side! Thank you for working through all of the different timing issues!
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
|
added sampling control for event recording |
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121829631 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D26158645/)!
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices and multi device modules in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) [ghstack-poisoned]
Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Differential Revision: [D26158645](https://our.internmc.facebook.com/intern/diff/D26158645/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D26158645/)!
|
This pull request has been merged in c75fa39. |
Summary: Pull Request resolved: pytorch#51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
Summary: Pull Request resolved: pytorch#51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
Summary: Pull Request resolved: pytorch#51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
Stack from ghstack:
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data
Differential Revision: D26158645