Speed up tensor.get_device(), is_cuda(), is_sparse() by avoiding dispatches by zou3519 · Pull Request #12841 · pytorch/pytorch

zou3519 · 2018-10-18T22:06:13Z

tensor.get_device() went through two dispatches: once to the native
function
get_device(), and another when get_device calls _th_get_device().
This PR avoids the dispatch by directly implementing the get_device
function
as a method on Tensor.

Future Work:

Investigate caching Device on TensorImpl. This will probably bring the
tensor.get_device down to 2ns, but I'm not sure it's worth it.

before:

------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BM_TensorTypeId                           0 ns          0 ns 1000000000
BM_TensorType                             8 ns          8 ns   89407911
BM_TensorIsCuda                          24 ns         24 ns   29313017
BM_TensorIsSparse                        27 ns         27 ns   26083160
BM_TensorTypeIsCuda                      11 ns         11 ns   65128120
BM_TensorNumel                           11 ns         11 ns   68314492
BM_TensorGetDevice                       71 ns         71 ns    9633125
BM_DeviceGuardCtor                      173 ns        173 ns    4067173
BM_DeviceGuard                          232 ns        232 ns    3009690

after:

------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BM_TensorTypeId                           0 ns          0 ns 1000000000
BM_TensorType                            10 ns         10 ns   69803872
BM_TensorIsCuda                           2 ns          2 ns  321626683
BM_TensorIsSparse                         6 ns          6 ns  177045382
BM_TensorNumel                           12 ns         12 ns   58770533
BM_TensorGetDevice                        4 ns          4 ns  128113396
BM_DeviceGuardCtor                       52 ns         52 ns   14997278
BM_DeviceGuard                          158 ns        158 ns    5767248

Sign in to view

ezyang

OK, but I do have some comments. Please read.

zou3519 · 2018-10-19T03:23:17Z

Er looks like what I did here (bypassing the dispatch) is not okay at all because it skips python bindings. Let me see if I can fix it.

ezyang · 2018-10-19T03:34:38Z

Hmm... I suspect all of the pre-canned functions on Tensor have hand written bindings. Not great.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign in to view

ezyang · 2018-10-22T22:44:06Z

@zou3519 Have you given any thought on how to prevent people from virtualizing this method again? :)

zou3519 · 2018-10-22T23:06:19Z

@ezyang Not sure. I imagine setting up a benchmark test for these functions would do the trick (virtualizing them vs not virtualizing is like a 10x difference), or just by adding a comment into the implementation

ezyang · 2018-10-23T19:22:59Z

I guess a comment will do for now.

zou3519 · 2018-10-23T22:35:48Z

https://stackoverflow.com/questions/22911112/how-to-detect-if-a-method-is-virtual suggests a way to detect if a method is virtual on gcc >= 7 and most if not all versions of clang. I'll try to see if I can do a static assert for this.

zou3519 · 2018-10-24T19:49:12Z

Nvm, that stackoverflow post explains how to detect if a class has virtual methods, not whether if a specific method is virtual. I'll stick with the comment then -- I've moved the implementations to TensorImpl as requested.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519 · 2018-10-25T14:43:03Z

Test failures look unrelated (they are all build timeouts).

When you get the chance, could you take another look at the changes again, @ezyang?

Sign in to view

ezyang

This looks way better. Some minor comments.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

`tensor.get_device()` went through two dispatches: once to the native function `get_device()`, and another when `get_device` calls `_th_get_device()`. This PR avoids the dispatch by directly implementing the `get_device` function as a method on Tensor. Future Work: - Investigate caching Device on TensorImpl. This will probably bring the tensor.get_device down to 10ns, but I'm not sure it is worth it. ``` after: ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 10 ns 10 ns 69803872 BM_TensorIsCuda 2 ns 2 ns 321626683 BM_TensorIsSparse 6 ns 6 ns 177045382 BM_TensorNumel 12 ns 12 ns 58770533 BM_TensorGetDevice 4 ns 4 ns 128113396 BM_DeviceGuardCtor 52 ns 52 ns 14997278 BM_DeviceGuard 158 ns 158 ns 5767248 before: ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 8 ns 8 ns 89407911 BM_TensorIsCuda 24 ns 24 ns 29313017 BM_TensorIsSparse 27 ns 27 ns 26083160 BM_TensorTypeIsCuda 11 ns 11 ns 65128120 BM_TensorNumel 11 ns 11 ns 68314492 BM_TensorGetDevice 71 ns 71 ns 9633125 BM_DeviceGuardCtor 173 ns 173 ns 4067173 BM_DeviceGuard 232 ns 232 ns 3009690 ```

Copied & pasted them from the old generated/python_variable_methods & generated/python_torch_functions.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…atches (#12841) Summary: `tensor.get_device()` went through two dispatches: once to the native function `get_device()`, and another when `get_device` calls `_th_get_device()`. This PR avoids the dispatch by directly implementing the `get_device` function as a method on Tensor. Future Work: - Investigate caching Device on TensorImpl. This will probably bring the tensor.get_device down to 2ns, but I'm not sure it's worth it. before: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 8 ns 8 ns 89407911 BM_TensorIsCuda 24 ns 24 ns 29313017 BM_TensorIsSparse 27 ns 27 ns 26083160 BM_TensorTypeIsCuda 11 ns 11 ns 65128120 BM_TensorNumel 11 ns 11 ns 68314492 BM_TensorGetDevice 71 ns 71 ns 9633125 BM_DeviceGuardCtor 173 ns 173 ns 4067173 BM_DeviceGuard 232 ns 232 ns 3009690 ``` after: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 10 ns 10 ns 69803872 BM_TensorIsCuda 2 ns 2 ns 321626683 BM_TensorIsSparse 6 ns 6 ns 177045382 BM_TensorNumel 12 ns 12 ns 58770533 BM_TensorGetDevice 4 ns 4 ns 128113396 BM_DeviceGuardCtor 52 ns 52 ns 14997278 BM_DeviceGuard 158 ns 158 ns 5767248 ``` Pull Request resolved: pytorch/pytorch#12841 Differential Revision: D10489353 Pulled By: zou3519 fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e

Summary: Followup to #12841 Changed these to not require type dispatch: tensor.type().is_cuda() -> tensor.is_cuda() tensor.type().is_sparse() -> tensor.is_sparse() isVariable(tensor.type()) -> tensor.is_variable() This probably does not affect performance very much in most cases but it is nice to have. Pull Request resolved: #13590 Reviewed By: ezyang Differential Revision: D12929301 Pulled By: zou3519 fbshipit-source-id: 8ac5c6200c579dd7a44fb4ee58fc9bb170feb1d7

…atches (pytorch#12841) Summary: `tensor.get_device()` went through two dispatches: once to the native function `get_device()`, and another when `get_device` calls `_th_get_device()`. This PR avoids the dispatch by directly implementing the `get_device` function as a method on Tensor. Future Work: - Investigate caching Device on TensorImpl. This will probably bring the tensor.get_device down to 2ns, but I'm not sure it's worth it. before: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 8 ns 8 ns 89407911 BM_TensorIsCuda 24 ns 24 ns 29313017 BM_TensorIsSparse 27 ns 27 ns 26083160 BM_TensorTypeIsCuda 11 ns 11 ns 65128120 BM_TensorNumel 11 ns 11 ns 68314492 BM_TensorGetDevice 71 ns 71 ns 9633125 BM_DeviceGuardCtor 173 ns 173 ns 4067173 BM_DeviceGuard 232 ns 232 ns 3009690 ``` after: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 10 ns 10 ns 69803872 BM_TensorIsCuda 2 ns 2 ns 321626683 BM_TensorIsSparse 6 ns 6 ns 177045382 BM_TensorNumel 12 ns 12 ns 58770533 BM_TensorGetDevice 4 ns 4 ns 128113396 BM_DeviceGuardCtor 52 ns 52 ns 14997278 BM_DeviceGuard 158 ns 158 ns 5767248 ``` Pull Request resolved: pytorch#12841 Differential Revision: D10489353 Pulled By: zou3519 fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e

…3590) Summary: Followup to pytorch#12841 Changed these to not require type dispatch: tensor.type().is_cuda() -> tensor.is_cuda() tensor.type().is_sparse() -> tensor.is_sparse() isVariable(tensor.type()) -> tensor.is_variable() This probably does not affect performance very much in most cases but it is nice to have. Pull Request resolved: pytorch#13590 Reviewed By: ezyang Differential Revision: D12929301 Pulled By: zou3519 fbshipit-source-id: 8ac5c6200c579dd7a44fb4ee58fc9bb170feb1d7

zou3519 requested review from ezyang and gchanan October 18, 2018 22:06

ezyang reviewed Oct 19, 2018

View reviewed changes

Comment thread aten/src/ATen/core/TensorMethods.h Outdated

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Oct 19, 2018

View reviewed changes

Comment thread aten/src/ATen/core/TensorMethods.h Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang approved these changes Oct 19, 2018

View reviewed changes

zou3519 force-pushed the get_device branch 2 times, most recently from 4ce85f1 to 0f7ced1 Compare October 22, 2018 15:46

facebook-github-bot reviewed Oct 22, 2018

View reviewed changes

zou3519 force-pushed the get_device branch from 0f7ced1 to a112c0a Compare October 22, 2018 17:11

zou3519 changed the title ~~Speed up tensor.get_device() by avoiding dispatches~~ Speed up tensor.get_device(), is_cuda(), is_sparse() by avoiding dispatches Oct 22, 2018

zou3519 force-pushed the get_device branch 2 times, most recently from c6a3290 to 92b08eb Compare October 22, 2018 18:19

facebook-github-bot reviewed Oct 22, 2018

View reviewed changes

zou3519 force-pushed the get_device branch from 41ed583 to 79f77c6 Compare October 22, 2018 19:51

ezyang reviewed Oct 22, 2018

View reviewed changes

zou3519 mentioned this pull request Oct 24, 2018

[perf] Reduce tensor & aten overhead #13049

Closed

21 tasks

zou3519 force-pushed the get_device branch 2 times, most recently from a9982ab to 47f5c2c Compare October 24, 2018 19:48

facebook-github-bot reviewed Oct 24, 2018

View reviewed changes

zou3519 force-pushed the get_device branch from 47f5c2c to 4a93f4e Compare October 24, 2018 19:53

facebook-github-bot reviewed Oct 24, 2018

View reviewed changes

zou3519 force-pushed the get_device branch from 4a93f4e to 47beda6 Compare October 25, 2018 02:59

facebook-github-bot reviewed Oct 25, 2018

View reviewed changes

ezyang reviewed Oct 25, 2018

View reviewed changes

Comment thread aten/src/ATen/core/TensorImpl.h Outdated

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Oct 25, 2018

View reviewed changes

Comment thread aten/src/ATen/core/TensorImpl.h Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Oct 25, 2018

View reviewed changes

Comment thread aten/src/ATen/core/TensorImpl.h Outdated

This comment was marked as off-topic.

Sign in to view

ezyang approved these changes Oct 25, 2018

View reviewed changes

facebook-github-bot reviewed Oct 25, 2018

View reviewed changes

zou3519 added 9 commits October 25, 2018 13:33

Add manual python bindings for get_device

c5eb884

Copied & pasted them from the old generated/python_variable_methods & generated/python_torch_functions.

Address comments

17c4f91

speed up get_device more

429c46c

Speed up is_cuda, is_sparse

fe2a13f

fix comment

64d4950

Additional cleanup; address comments

d74faff

Fix Variable get_device

adb7257

address comments: add comments

8d9204a

zou3519 force-pushed the get_device branch from 394c0bf to 8d9204a Compare October 25, 2018 22:13

Fix rebase

5cc7fac

facebook-github-bot reviewed Oct 25, 2018

View reviewed changes

facebook-github-bot closed this in efab8e8 Oct 26, 2018

zou3519 mentioned this pull request Nov 5, 2018

codemod tensor.type().is_cuda(), tensor.type().is_sparse() #13590

Closed

ezyang added the merged label Jun 25, 2019

Conversation

zou3519 commented Oct 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Oct 19, 2018

Uh oh!

ezyang commented Oct 19, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Oct 22, 2018

Uh oh!

zou3519 commented Oct 22, 2018

Uh oh!

ezyang commented Oct 23, 2018

Uh oh!

zou3519 commented Oct 23, 2018

Uh oh!

zou3519 commented Oct 24, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Oct 25, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

zou3519 commented Oct 18, 2018 •

edited

Loading