Add documentation page for pipeline parallelism.#50791
Add documentation page for pipeline parallelism.#50791pritamdamania87 wants to merge 5 commits intogh/pritamdamania87/200/basefrom
Conversation
Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/) [ghstack-poisoned]
Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/) ghstack-source-id: 120018222 Pull Request resolved: #50791
Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/) [ghstack-poisoned]
Pull Request resolved: #50791 Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. ghstack-source-id: 120057129 Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
rohan-varma
left a comment
There was a problem hiding this comment.
LGTM. Are we planning to add additional documentation/tutorials that go into more details on how to write an application with Pipe and combining it with DDP?
| (vertical axis). The horizontal axis represents training this model through | ||
| time demonstrating that the GPUs are utilized much more efficiently. | ||
| However, there still exists a bubble (as demonstrated in the figure) where | ||
| certain GPUs are not utilized. |
There was a problem hiding this comment.
Would it be useful to give an approximation of the increase in utilization a user can expect when using Pipe? I guess this varies a lot for different workloads but maybe we can take an example workload?
There was a problem hiding this comment.
I feel the two figures attached provide the user a good idea how GPUs are utilized in a more efficient manner. I feel its probably better to illustrate the speed up of this technique in a benchmark/tutorial instead of our docs where we are mostly documenting the feature and its APIs.
pritamdamania87
left a comment
There was a problem hiding this comment.
Are we planning to add additional documentation/tutorials that go into more details on how to write an application with Pipe and combining it with DDP?
Yes, I'm planning to write a tutorial for this. The idea I had was to use the Transformer example from here: https://pytorch.org/tutorials/beginner/transformer_tutorial.html, increase the model size (layers, hidden units etc.) such that it doesn't fit on a single GPU and show how the same model can be trained using pipeline parallelism.
| (vertical axis). The horizontal axis represents training this model through | ||
| time demonstrating that the GPUs are utilized much more efficiently. | ||
| However, there still exists a bubble (as demonstrated in the figure) where | ||
| certain GPUs are not utilized. |
There was a problem hiding this comment.
I feel the two figures attached provide the user a good idea how GPUs are utilized in a more efficient manner. I feel its probably better to illustrate the speed up of this technique in a benchmark/tutorial instead of our docs where we are mostly documenting the feature and its APIs.
Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/) [ghstack-poisoned]
Pull Request resolved: #50791 Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. ghstack-source-id: 120173804 Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/) [ghstack-poisoned]
Pull Request resolved: #50791 Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. ghstack-source-id: 120214106 Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/) [ghstack-poisoned]
Pull Request resolved: #50791 Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. ghstack-source-id: 120257168 Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
mrshenli
left a comment
There was a problem hiding this comment.
are we going to have tutorials and examples on Portal?
What is Portal? Was planning to write tutorials and examples as we usually do, is there some new process around this? |
|
This pull request has been merged in 68c2185. |
This one, I guess this is a main feature in pipe? |
Summary: Pull Request resolved: pytorch#50791 Add a dedicated pipeline parallelism doc page explaining the APIs and the overall value of the module. ghstack-source-id: 120257168 Test Plan: 1) View locally 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25967981 fbshipit-source-id: b607b788703173a5fa4e3526471140506171632b
Stack from ghstack:
Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.
Differential Revision: D25967981