Skip to content

Conversation

@cli99
Copy link
Contributor

@cli99 cli99 commented Nov 20, 2020

No description provided.

@cli99 cli99 marked this pull request as ready for review November 20, 2020 03:46
Copy link
Collaborator

@jeffra jeffra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see DeepSpeedExamples (DSE) has lots of changes here, is this because we're wanting to point to an updated version of DSE that uses the profiler? or just pointing to a different commit than what the current master branch points to?

@@ -0,0 +1,43 @@
from functools import partial
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should move these examples to DSE? or maybe still in the DeepSpeed repo but not in the deepspeed package? I am not sure we want them to be installed with the library itself (e.g., hidden away as a library like
/usr/local/lib/python3.6/dist-packages/deepspeed) where they are hard to find and run directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the examples to readme and tests/unit/test_flops_profiler.py

@@ -0,0 +1,43 @@
from functools import partial
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that we have these examples. We might want to move them to the DSE repo? And we could also include them in a page on readthedocs if you think that would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the examples. I will add them to dse and readthedocs.

@@ -0,0 +1,12 @@
#! /bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into our unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

export WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))

# python test_model.py
python test_model.py --flops_profiler true --profile_start_step=5 --profile_end_step=8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have some unit tests with models that we know the expected flop counts for. Then we could run the flops profiler to ensure our counts stay consistent?

parser = argparse.ArgumentParser()
parser.add_argument("--local_rank", type=int, default=0)
parser.add_argument('--zero', type=int, default=0)
parser.add_argument('--flops_profiler', type=bool, default=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, it would be awesome to move this into our unit tests (tests/unit/).

@@ -0,0 +1 @@
from .profiler import get_model_profile, add_profile_methods, print_model_profile, print_model_aggregated_profile, flops_to_string, duration_to_string, params_to_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this many exports, we could also just from .profiler import * and then in profiler be sure to follow public/private naming conventions. Methods starting with _ won't be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put all the functions in a FlopsProfiler class which will be the only thing to import



# FC
F.linear = wrapFunc(F.linear, linear_flops_compute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the below wraps occur each time the module is imported? Can things get double-wrapped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved wrapping functionals into the profiler.start() and added unwrapping functionals in the profile.end().

F.linear = wrapFunc(F.linear, linear_flops_compute)

# convolutions
F.conv1d = wrapFunc(F.conv1d, conv_flops_compute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also unwrap these functionals after profiling is complete? The polite thing might be to only wrap while profiling is active, if that's possible. I see we do that with modules in end_profile?

F.avg_pool2d = wrapFunc(F.avg_pool2d, pool_flops_compute)
F.avg_pool3d = wrapFunc(F.avg_pool3d, pool_flops_compute)
F.max_pool1d = wrapFunc(F.max_pool1d, pool_flops_compute)
F.max_pool2d = wrapFunc(F.max_pool2d, pool_flops_compute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool to have a data structure that maps a list of functionals to the compute wrapper, and then at wrap time the engine code just traverses that data structure to do this work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monkey patching does not work if the target function is referenced as an item in a loop.

return logits, probs


if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks left over from development.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

from ..ops.adam import DeepSpeedCPUAdam
from ..ops.adam import FusedAdam

from deepspeed.profiling.flops_profiler.profiler import add_profile_methods, print_model_profile, print_model_aggregated_profile, flops_to_string, params_to_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we manage exports within each subpackage, we can just from deepspeed.profiling import * to expose the high-level API I think. That's a bit nicer than having a long list like this.

Since we do have this pattern of exporting a lot of methods, what about wrapping them up in a FlopsProfiler class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@cli99 cli99 marked this pull request as draft November 23, 2020 21:17
Copy link
Contributor

@samyam samyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to just leave a high level review here based on some our earlier discussion. I think it would be very cool to do the profiling using profiler.start() and profiler.stop() instead of using get_model_profile(...). This is aligned more closely with general training workflows, and additionally the it would not require the user to create an input constructor. Overall, it would be a more natural and seamless way to use the profiler. What do you think @cli99 ?

@ShadenSmith
Copy link
Contributor

I want to just leave a high level review here based on some our earlier discussion. I think it would be very cool to do the profiling using profiler.start() and profiler.stop() instead of using get_model_profile(...). This is aligned more closely with general training workflows, and additionally the it would not require the user to create an input constructor. Overall, it would be a more natural and seamless way to use the profiler. What do you think @cli99 ?

This would also go towards addressing the import side effect we chatted about @cli99. It might be a good opportunity to use contextlib ?

return p


def linear_flops_compute(input, weight, bias=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we wanna account for the bias flops when it is not None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the flops for computing bias? I thought it's zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider the dimension of bias (like out_features), which is the number of addition. But, this will be very small probably, so that's okay if not considering that.


def linear_flops_compute(input, weight, bias=None):
out_features = weight.shape[0]
return torch.numel(input) * out_features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we multiply this by 2 to account for multiply-add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the flops profiler outputs MACs and the user can multiply that by 2 or anything. The throughput of a module is computed as TFLOPS where I do MACs*2/time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I was thinking that this is not always MAC. I am not sure if we want to take into account the point-wise operations into the profiler like bias-add or normalize.

@cli99 cli99 marked this pull request as ready for review November 30, 2020 21:42
@cli99 cli99 changed the title Cheng/flops profiler Add the flops profiler which measures the time, number of estimated flops and parameters of each module in a model Nov 30, 2020
for step, batch in enumerate(data_loader):
# start profiling at training step "profile_step"
if step == profile_start_step:
model.start_profile()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im wondering if its more transparent to not overwrite the model into a profiler, and keep the model intact? Something like

model_profiler = prof.add_profile_method(model)

model_profiler.start()
y=model(x)
model_profiler.stop()

Also for getting the info we can then do

model_profiler.print_all()
flops = model_profiler.get_flops()

I actually don't know which one is better. Maybe we should have a discussion on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme shown here is out of date. The latest code does this. Happy to chat more about it.

@jeffra
Copy link
Collaborator

jeffra commented Jan 13, 2021

Closing this one after rebasing/squashing, moved to #664

@jeffra jeffra closed this Jan 13, 2021
@cli99 cli99 deleted the cheng/flops_profiler branch March 25, 2021 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants