-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Add flops profiler tutorial #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flops profiler tutorial #682
Conversation
| ) | ||
| (transformer): ParallelTransformer( | ||
| 12.61 M, 32.43% Params, 103.62 GMACs, 100.00% MACs, 4.4 ms, 13.22% time, 4.7e+01 TFLOPS, | ||
| (layers): ModuleList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is time percent and TFlops 0 for ModuleList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| model = models.alexnet() | ||
| batch_size = 256 | ||
| macs, params, steps = get_model_profile(model, # the PyTorch model to be profiled | ||
| macs, params = get_model_profile(model=model, # model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to how how the model is called without the profiling so that the input_res and input_consturctors are clearer. Maybe have:
if profile:
macs, params = get_model_profile
else:
output/loss = model(....)
| macs, params, steps = get_model_profile( | ||
| batch_size = 5 | ||
| seq_len = 128 | ||
| macs, params = get_model_profile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to how how the model is called without the profiling so that the input_res and input_consturctors are clearer. Maybe have:
if profile:
macs, params = get_model_profile
else:
output/loss = model(....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made changes as suggested
| # Output: | ||
| # Number of multiply-adds: 21.74 GMACs | ||
| # Number of parameters: 109.48 M | ||
| Below is an example of this usage in a typical training workflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this mode, is the profiler capturing only the forward, or forward backward and step? Can we make this more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The profiler only captures the forward. I clarify this through the README.
* Dist testing backend fixes, etc. (deepspeedai#708) * set_batch_fn and remove old sanity check (deepspeedai#712) * properly set engine.local_rank if it's set to -1 * Add executable permission to `ds_elastic` and `ds_report` in `bin`. (deepspeedai#711) * Add executable permission to `ds_elastic` and `ds_report` in `bin`. * Automatic `ds_elastic` formatting Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * local rank of -1 means not set (deepspeedai#720) * bump to 0.3.11 * [launcher] look ma, no more zombies (deepspeedai#714) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Improve starred expressions (deepspeedai#696) * Improve starred expressions `deepspeed/profiling/flops_profiler/profiler.py` uses starred expressions that are no longer valid with [PEP 617][1]. The new Python parser is in 3.9, and this change allows DeepSpeed to run with the newest Python version. I have not checked all locations that has this issue. However, this change allows me to run simple examples. [1]: https://www.python.org/dev/peps/pep-0617/ * Match style for "Improve starred expressions", although readability suffers The style guide might need to be updated for this new use case of expressions. Python [Issue 40631][1] includes more discussion on the change. [1]: https://bugs.python.org/issue40631 Co-authored-by: Cheng Li <pistasable@gmail.com> * Fixed typo in Readme. (deepspeedai#737) * 1bit_adam dependencies (deepspeedai#742) * Clickable screenshots (deepspeedai#746) * Fix docstring * Make screenshots clickable for easier viewing * Add flops profiler tutorial (deepspeedai#682) * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial * Only initialize distributed if required (deepspeedai#734) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jon Eyolfson <eyolfson@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: TheDudeFromCI <thedudefromci@gmail.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Sean Naren <sean@grid.ai>
* work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial
* work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial
|
In the inference module, can I add the performance analysis tutorial of llama2? |
Added the flops profiler tutorial, configuration, and feature to the website. Also fixed names of some flops profiler function parameters.