Skip to content

parallel norm operation for ATen on CPU#10535

Closed
xhzhao wants to merge 9 commits intopytorch:masterfrom
xhzhao:norm_opt
Closed

parallel norm operation for ATen on CPU#10535
xhzhao wants to merge 9 commits intopytorch:masterfrom
xhzhao:norm_opt

Conversation

@xhzhao
Copy link
Contributor

@xhzhao xhzhao commented Aug 15, 2018

optimize norm operation for ATen CPU path.
norm is a very heavy operation in RNN related workloads, see OpenNMT-py example.
Our profiling show that norm takes about 8% in OpenNMT-py training time, which is not acceptable.

currently, the code path from TH module runs in sequential on CPU, see link.
The norm performance data compare before and after our optimization:

norm size SKX6148 OOB SKX6148 OPT speedup
[35820, 500] 19.9693 0.6204 32.19
[24997, 500] 13.8752 0.4370 31.75
[2000, 1000] 2.1803 0.0926 23.53
[2000, 500] 1.0949 0.0508 21.55
[500, 1000] 0.5483 0.0494 11.10
[500, 500] 0.2758 0.0494 5.58
norm size i7 OOB i7 OPT speedup
[35820, 500] 18.0185 3.8186 4.72
[24997, 500] 12.5457 2.5238 4.97
[2000, 1000] 1.8258 0.4204 4.34
[2000, 500] 0.9252 0.2391 3.87
[500, 1000] 0.4696 0.1129 4.16
[500, 500] 0.2380 0.0650 3.66

@fmassa
Copy link
Member

fmassa commented Aug 15, 2018

cc @colesbury, as I believe the TensorIterator reduction work can also be used for norm computation

@xhzhao
Copy link
Contributor Author

xhzhao commented Aug 16, 2018

@fmassa Thanks for your response. Basically, i follow the sum implementation here, and this method is very easy to implement. I also take a look at a example of the TensorIterator, but it seems that there are only 4 operations supported(add/sub/mul/div).

@fmassa
Copy link
Member

fmassa commented Aug 16, 2018

@xhzhao TensorIteration with reduction is not yet merged in master, but it will allow for more generic reductions on contiguous and non-contiguous tensors, over possibly multiple dimensions.
For a glimpse of what it will look like, have a look at https://github.com/colesbury/pytorch/tree/tensor_iterator_sum

I mentioned the TensorIterator to Sam so that we keep the norm operation in mind once once it is merged.

@xhzhao
Copy link
Contributor Author

xhzhao commented Aug 22, 2018

Maybe we can review this PR first, and update the norm operation after the TensorIterator PR merged.
what do you say?

@weiyangfb weiyangfb added the ready for review (this tag is deprecated) All PRs are ready for review unless they are draft, WIP, or have undismissed requested changes label Aug 22, 2018
}

static scalar_t norm_calc(const scalar_t* data, int64_t n, int64_t stride, float pval) {
scalar_t result = 0.0;

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@fmassa
Copy link
Member

fmassa commented Sep 14, 2018

Merged in #11565

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open source ready for review (this tag is deprecated) All PRs are ready for review unless they are draft, WIP, or have undismissed requested changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants