Skip to content

Add documentation page for pipeline parallelism.#50791

Closed
pritamdamania87 wants to merge 5 commits intogh/pritamdamania87/200/basefrom
gh/pritamdamania87/200/head
Closed

Add documentation page for pipeline parallelism.#50791
pritamdamania87 wants to merge 5 commits intogh/pritamdamania87/200/basefrom
gh/pritamdamania87/200/head

Conversation

@pritamdamania87
Copy link
Copy Markdown
Contributor

@pritamdamania87 pritamdamania87 commented Jan 20, 2021

Stack from ghstack:

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.

Differential Revision: D25967981

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jan 20, 2021
pritamdamania87 pushed a commit that referenced this pull request Jan 20, 2021
Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)

ghstack-source-id: 120018222
Pull Request resolved: #50791
Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jan 20, 2021
Pull Request resolved: #50791

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.
ghstack-source-id: 120057129

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
Copy link
Copy Markdown
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Are we planning to add additional documentation/tutorials that go into more details on how to write an application with Pipe and combining it with DDP?

Comment thread docs/source/pipeline.rst
(vertical axis). The horizontal axis represents training this model through
time demonstrating that the GPUs are utilized much more efficiently.
However, there still exists a bubble (as demonstrated in the figure) where
certain GPUs are not utilized.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to give an approximation of the increase in utilization a user can expect when using Pipe? I guess this varies a lot for different workloads but maybe we can take an example workload?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the two figures attached provide the user a good idea how GPUs are utilized in a more efficient manner. I feel its probably better to illustrate the speed up of this technique in a benchmark/tutorial instead of our docs where we are mostly documenting the feature and its APIs.

Copy link
Copy Markdown
Contributor Author

@pritamdamania87 pritamdamania87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to add additional documentation/tutorials that go into more details on how to write an application with Pipe and combining it with DDP?

Yes, I'm planning to write a tutorial for this. The idea I had was to use the Transformer example from here: https://pytorch.org/tutorials/beginner/transformer_tutorial.html, increase the model size (layers, hidden units etc.) such that it doesn't fit on a single GPU and show how the same model can be trained using pipeline parallelism.

Comment thread docs/source/pipeline.rst
(vertical axis). The horizontal axis represents training this model through
time demonstrating that the GPUs are utilized much more efficiently.
However, there still exists a bubble (as demonstrated in the figure) where
certain GPUs are not utilized.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the two figures attached provide the user a good idea how GPUs are utilized in a more efficient manner. I feel its probably better to illustrate the speed up of this technique in a benchmark/tutorial instead of our docs where we are mostly documenting the feature and its APIs.

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jan 22, 2021
Pull Request resolved: #50791

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.
ghstack-source-id: 120173804

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jan 22, 2021
Pull Request resolved: #50791

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.
ghstack-source-id: 120214106

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jan 23, 2021
Pull Request resolved: #50791

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.
ghstack-source-id: 120257168

Differential Revision: [D25967981](https://our.internmc.facebook.com/intern/diff/D25967981/)
Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to have tutorials and examples on Portal?

@pritamdamania87
Copy link
Copy Markdown
Contributor Author

are we going to have tutorials and examples on Portal?

What is Portal? Was planning to write tutorials and examples as we usually do, is there some new process around this?

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 68c2185.

@mrshenli
Copy link
Copy Markdown
Contributor

What is Portal? Was planning to write tutorials and examples as we usually do, is there some new process around this?

This one, I guess this is a main feature in pipe?
https://github.com/pytorch/pytorch/blob/master/torch/distributed/pipeline/sync/skip/portal.py

@facebook-github-bot facebook-github-bot deleted the gh/pritamdamania87/200/head branch January 29, 2021 15:21
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#50791

Add a dedicated pipeline parallelism doc page explaining the APIs and
the overall value of the module.
ghstack-source-id: 120257168

Test Plan:
1) View locally
2) waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D25967981

fbshipit-source-id: b607b788703173a5fa4e3526471140506171632b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants