Skip to content

Conversation

@cli99
Copy link
Contributor

@cli99 cli99 commented Jan 20, 2021

Added the flops profiler tutorial, configuration, and feature to the website. Also fixed names of some flops profiler function parameters.

)
(transformer): ParallelTransformer(
12.61 M, 32.43% Params, 103.62 GMACs, 100.00% MACs, 4.4 ms, 13.22% time, 4.7e+01 TFLOPS,
(layers): ModuleList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is time percent and TFlops 0 for ModuleList?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

model = models.alexnet()
batch_size = 256
macs, params, steps = get_model_profile(model, # the PyTorch model to be profiled
macs, params = get_model_profile(model=model, # model
Copy link
Contributor

@samyam samyam Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to how how the model is called without the profiling so that the input_res and input_consturctors are clearer. Maybe have:
if profile:
macs, params = get_model_profile
else:
output/loss = model(....)

macs, params, steps = get_model_profile(
batch_size = 5
seq_len = 128
macs, params = get_model_profile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to how how the model is called without the profiling so that the input_res and input_consturctors are clearer. Maybe have:
if profile:
macs, params = get_model_profile
else:
output/loss = model(....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes as suggested

# Output:
# Number of multiply-adds: 21.74 GMACs
# Number of parameters: 109.48 M
Below is an example of this usage in a typical training workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this mode, is the profiler capturing only the forward, or forward backward and step? Can we make this more explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profiler only captures the forward. I clarify this through the README.

@cli99 cli99 merged commit e2dfe0d into deepspeedai:master Feb 11, 2021
sdtblck added a commit to EleutherAI/DeeperSpeed that referenced this pull request Feb 11, 2021
* Dist testing backend fixes, etc. (deepspeedai#708)

* set_batch_fn and remove old sanity check (deepspeedai#712)

* properly set engine.local_rank if it's set to -1

* Add executable permission to `ds_elastic` and `ds_report` in `bin`. (deepspeedai#711)

* Add executable permission to `ds_elastic` and `ds_report` in `bin`.

* Automatic `ds_elastic` formatting

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* local rank of -1 means not set (deepspeedai#720)

* bump to 0.3.11

* [launcher] look ma, no more zombies (deepspeedai#714)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Improve starred expressions (deepspeedai#696)

* Improve starred expressions

`deepspeed/profiling/flops_profiler/profiler.py` uses starred expressions
that are no longer valid with [PEP 617][1]. The new Python parser is in 3.9,
and this change allows DeepSpeed to run with the newest Python version. I have
not checked all locations that has this issue. However, this change allows me
to run simple examples.

[1]: https://www.python.org/dev/peps/pep-0617/

* Match style for "Improve starred expressions", although readability suffers

The style guide might need to be updated for this new use case of expressions.
Python [Issue 40631][1] includes more discussion on the change.

[1]: https://bugs.python.org/issue40631

Co-authored-by: Cheng Li <pistasable@gmail.com>

* Fixed typo in Readme. (deepspeedai#737)

* 1bit_adam dependencies (deepspeedai#742)

* Clickable screenshots (deepspeedai#746)

* Fix docstring

* Make screenshots clickable for easier viewing

* Add flops profiler tutorial (deepspeedai#682)

* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* fix tailing ws

* fix names

* remove multistep profiling and update docs

* fix cases where functionals and submodules coexist in a parent module, update readme

* fix typo

* always invoke post hook function

* fix module flops sum and update tests

* update tutorial

* Only initialize distributed if required (deepspeedai#734)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Jon Eyolfson <eyolfson@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: TheDudeFromCI <thedudefromci@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Sean Naren <sean@grid.ai>
@cli99 cli99 deleted the cheng/flops-profiler-tutorial branch March 25, 2021 21:17
B06901052 pushed a commit to B06901052/DeepSpeed that referenced this pull request Apr 14, 2022
* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* fix tailing ws

* fix names

* remove multistep profiling and update docs

* fix cases where functionals and submodules coexist in a parent module, update readme

* fix typo

* always invoke post hook function

* fix module flops sum and update tests

* update tutorial
B06901052 pushed a commit to B06901052/DeepSpeed that referenced this pull request Apr 14, 2022
* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* fix tailing ws

* fix names

* remove multistep profiling and update docs

* fix cases where functionals and submodules coexist in a parent module, update readme

* fix typo

* always invoke post hook function

* fix module flops sum and update tests

* update tutorial
@Nuclear6
Copy link

In the inference module, can I add the performance analysis tutorial of llama2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants