[FX] Added fuser tutorial#1356
Conversation
|
Deploy preview for pytorch-tutorials-preview ready! Built with commit 14a7913 https://deploy-preview-1356--pytorch-tutorials-preview.netlify.app |
index.rst
Outdated
| .. Code Transformations with FX | ||
| .. customcarditem:: | ||
| :header: Building a Convolution/Batch Norm fuser in FX | ||
| :card_description: Build a simple FX interpreter to record the runtime of op, module, and function calls and report statistics |
There was a problem hiding this comment.
are the card description and images correct? They look like they belong to other tutorials?
There was a problem hiding this comment.
Description is wrong, but not sure what to put for the image. @jamesr66a is there any reason you chose this image for the performance profiling for FX? https://github.com/pytorch/tutorials/pull/1319/files#diff-54a294a5d016e1a8e98bc95668ed84a99a9edd5c10394d9a2b1ee848006e98a7R223
|
|
||
| .. Code Transformations with FX | ||
| .. customcarditem:: | ||
| :header: Building a Convolution/Batch Norm fuser in FX |
There was a problem hiding this comment.
I've seen this technique more commonly referred to as "folding" but both make sense (https://towardsdatascience.com/speed-up-inference-with-batch-normalization-folding-8a45a83a89d8, https://arxiv.org/abs/1611.09842 calls it "absorbing").
Might be nice to use different terminology in case we want to add a "first class" fusion tutorial later that e.g. directly calls into NNC.
There was a problem hiding this comment.
I agree the terminology is confusing, but I think fusion is an acceptable (and more widely understood) term. If we add a fusion tutorial later I'd be glad to rename it to something to avoid name conflicts.
index.rst
Outdated
| .. Code Transformations with FX | ||
| .. customcarditem:: | ||
| :header: Building a Convolution/Batch Norm fuser in FX | ||
| :card_description: Build a simple FX interpreter to record the runtime of op, module, and function calls and report statistics |
| # accessing the computational graph. FX resolves this problem by symbolically | ||
| # tracing the actual operations called, so that we can track the computations | ||
| # through the `forward` call, nested within Sequential modules, or wrapped in | ||
| # an user-defined module. |
There was a problem hiding this comment.
🤔 Shouldn't it be an before user? Since user starts with a vowel?
There was a problem hiding this comment.
the rule is based on the sound, not the letter.
|
|
||
| fused_model = fuse(model) | ||
| print(fused_model.code) | ||
| inp = torch.randn(5, 1, 1, 1) |
There was a problem hiding this comment.
Should we run this on a more realistic input shape?
There was a problem hiding this comment.
I just wrote all the conv/batch norm modules to operate on a [1,1,1] shape. We're not measuring the performance of this module so I don't think it matters.
* Update build.sh * Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (#1379) * [wip] replace audio tutorial * Update * Update * Update * fixup * Update requirements.txt * update * Update Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (#1352) * switch to the new dataset API * checkpoint * checkpoint * checkpoint * update docs * checkpoint * switch to legacy vocab * update to follow the master API * checkpoint * checkpoint * address reviewer's comments Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to LM dataset in torchtext 0.9.0 release (#1349) * switch to raw text dataset in torchtext 0.9.0 release * follow the new API in torchtext master Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [WIP][FX] CPU Performance Profiling with FX (#1319) Co-authored-by: Brian Johnson <brianjo@fb.com> * [FX] Added fuser tutorial (#1356) * Added fuser tutorial * updated index.rst * fixed conclusion * responded to some comments * responded to comments * respond Co-authored-by: Brian Johnson <brianjo@fb.com> * Update numeric_suite_tutorial.py * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (#1347) * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models. Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process drives GPUs 0 and 1 and another drives GPUs 2 and 3. * Polish out some of the docs. * Add thumbnail and address some comments. Co-authored-by: pritam <pritam.damania@fb.com> * More updates to numeric_suite * Even more updates * Update numeric_suite_tutorial.py Hopefully that's the last one * Update numeric_suite_tutorial.py Last one * Update build.sh Co-authored-by: moto <855818+mthrok@users.noreply.github.com> Co-authored-by: Guanheng George Zhang <6156351+zhangguanheng66@users.noreply.github.com> Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Horace He <horacehe2007@yahoo.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>
* Update build.sh * Update audio tutorial for release pytorch 1.8 / torchaudio 0.8 (pytorch#1379) * [wip] replace audio tutorial * Update * Update * Update * fixup * Update requirements.txt * update * Update Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to the new datasets in torchtext 0.9.0 release - text classification tutorial (pytorch#1352) * switch to the new dataset API * checkpoint * checkpoint * checkpoint * update docs * checkpoint * switch to legacy vocab * update to follow the master API * checkpoint * checkpoint * address reviewer's comments Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [1.8 release] Switch to LM dataset in torchtext 0.9.0 release (pytorch#1349) * switch to raw text dataset in torchtext 0.9.0 release * follow the new API in torchtext master Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: Brian Johnson <brianjo@fb.com> * [WIP][FX] CPU Performance Profiling with FX (pytorch#1319) Co-authored-by: Brian Johnson <brianjo@fb.com> * [FX] Added fuser tutorial (pytorch#1356) * Added fuser tutorial * updated index.rst * fixed conclusion * responded to some comments * responded to comments * respond Co-authored-by: Brian Johnson <brianjo@fb.com> * Update numeric_suite_tutorial.py * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models (pytorch#1347) * Tutorial combining DDP with Pipeline Parallelism to Train Transformer models. Summary: Tutorial which places a pipe on GPUs 0 and 1 and another Pipe on GPUs 2 and 3. Both pipe replicas are replicated via DDP. One process drives GPUs 0 and 1 and another drives GPUs 2 and 3. * Polish out some of the docs. * Add thumbnail and address some comments. Co-authored-by: pritam <pritam.damania@fb.com> * More updates to numeric_suite * Even more updates * Update numeric_suite_tutorial.py Hopefully that's the last one * Update numeric_suite_tutorial.py Last one * Update build.sh Co-authored-by: moto <855818+mthrok@users.noreply.github.com> Co-authored-by: Guanheng George Zhang <6156351+zhangguanheng66@users.noreply.github.com> Co-authored-by: Guanheng Zhang <zhangguanheng@devfair0197.h2.fair> Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Horace He <horacehe2007@yahoo.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>

Not sure how to test it in notebook format.
Also, perhaps I'd like to bake in the output somehow? It would be somewhat embarassing if the fused version was slower due to noise :)