[PyTorch] Put functor at the end of wrap_kernel_functor_unboxed params by swolchok · Pull Request #51164 · pytorch/pytorch

swolchok · 2021-01-27T00:39:38Z

Stack from ghstack:

[PyTorch] Pass TensorOptions by value #51165 [PyTorch] Pass TensorOptions by value
[PyTorch] Put functor at the end of wrap_kernel_functor_unboxed params #51164 [PyTorch] Put functor at the end of wrap_kernel_functor_unboxed params
[PyTorch] Refactor Dispatcher to inline less code in fast path #51163 [PyTorch] Refactor Dispatcher to inline less code in fast path
[PyTorch] Remove unnecessary dispatcher.h include in torch/library.h #51162 [PyTorch] Remove unnecessary dispatcher.h include in torch/library.h

In the (hopefully) common case where the unboxed functor is
really a c10::CompileTimeFunctionPointer, this enables a jump
straight to said functor.

Note that I had no end of trouble getting this to have decent
performance without the previous diff, and even with it, all the
manual C10_ALWAYS_INLINE I added were necessary.

Differential Revision: D26069629

In the (hopefully) common case where the unboxed functor is really a `c10::CompileTimeFunctionPointer`, this enables a jump straight to said functor. Note that I had no end of trouble getting this to have decent performance without the previous diff, and even with it, all the manual C10_ALWAYS_INLINE I added were necessary. Differential Revision: [D26069629](https://our.internmc.facebook.com/intern/diff/D26069629/) [ghstack-poisoned]

facebook-github-bot · 2021-01-27T00:39:52Z

💊 CI failures summary and remediations

As of commit a5bb6dd (more details on the Dr. CI page):

7/10 failures possibly* introduced in this PR
- 1/7 non-CircleCI failure(s)
3/10 broken upstream at merge base b678fdf on Jan 26 from 1:29pm to 6:17pm

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_macos_10_13_py3_test (1/5)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jan 27 01:54:15 AssertionError: 'aten::t' != 'aten::addmm'

Jan 27 01:54:15 AssertionError: True is not false
Jan 27 01:54:15 
Jan 27 01:54:15 ======================================================================
Jan 27 01:54:15 FAIL [0.006s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 01:54:15 ----------------------------------------------------------------------
Jan 27 01:54:15 Traceback (most recent call last):
Jan 27 01:54:15   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 01:54:15     self.assertEqual(event.name, name_expected)
Jan 27 01:54:15   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 01:54:15     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 01:54:15 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 01:54:15 - aten::t
Jan 27 01:54:15 + aten::addmm
Jan 27 01:54:15  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 01:54:15 
Jan 27 01:54:15 ----------------------------------------------------------------------
Jan 27 01:54:15 Ran 1520 tests in 628.955s
Jan 27 01:54:15 
Jan 27 01:54:15 FAILED (failures=4, skipped=17, expected failures=1)
Jan 27 01:54:15 
Jan 27 01:54:15 Generating XML reports...

pytorch_windows_vs2019_py36_cuda10.1_test2 (2/5)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: 'aten::t' != 'aten::addmm'

AssertionError: True is not false

======================================================================
FAIL [0.008s]: test_profiler_shapes (__main__.TestAutograd)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_autograd.py", line 3233, in test_profiler_shapes
    self.assertEqual(event.name, name_expected)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1169, in assertEqual
    super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: 'aten::t' != 'aten::addmm'
- aten::t
+ aten::addmm
 : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.

----------------------------------------------------------------------
Ran 2794 tests in 2068.898s

FAILED (failures=4, skipped=23, expected failures=1)

Generating XML reports...

pytorch_linux_xenial_py3_clang5_asan_test2 (3/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 27 02:54:13 AssertionError: 'aten::t' != 'aten::addmm'

Jan 27 02:54:13 AssertionError: True is not false
Jan 27 02:54:13 
Jan 27 02:54:13 ======================================================================
Jan 27 02:54:13 FAIL [0.011s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 02:54:13 ----------------------------------------------------------------------
Jan 27 02:54:13 Traceback (most recent call last):
Jan 27 02:54:13   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 02:54:13     self.assertEqual(event.name, name_expected)
Jan 27 02:54:13   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 02:54:13     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 02:54:13 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 02:54:13 - aten::t
Jan 27 02:54:13 + aten::addmm
Jan 27 02:54:13  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 02:54:13 
Jan 27 02:54:13 ----------------------------------------------------------------------
Jan 27 02:54:13 Ran 1520 tests in 2273.470s
Jan 27 02:54:13 
Jan 27 02:54:13 FAILED (failures=4, skipped=16, expected failures=1)
Jan 27 02:54:13 
Jan 27 02:54:13 Generating XML reports...

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 27 02:38:22 AssertionError: 'aten::t' != 'aten::addmm'

Jan 27 02:38:22 AssertionError: True is not false
Jan 27 02:38:22 
Jan 27 02:38:22 ======================================================================
Jan 27 02:38:22 FAIL [0.013s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 02:38:22 ----------------------------------------------------------------------
Jan 27 02:38:22 Traceback (most recent call last):
Jan 27 02:38:22   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 02:38:22     self.assertEqual(event.name, name_expected)
Jan 27 02:38:22   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 02:38:22     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 02:38:22 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 02:38:22 - aten::t
Jan 27 02:38:22 + aten::addmm
Jan 27 02:38:22  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 02:38:22 
Jan 27 02:38:23 ----------------------------------------------------------------------
Jan 27 02:38:23 Ran 1520 tests in 1044.614s
Jan 27 02:38:23 
Jan 27 02:38:23 FAILED (failures=4, skipped=16, expected failures=1)
Jan 27 02:38:23 
Jan 27 02:38:23 Generating XML reports...

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (5/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 27 03:49:13 AssertionError: 'aten::t' != 'aten::addmm'

Jan 27 03:49:13 AssertionError: True is not false
Jan 27 03:49:13 
Jan 27 03:49:13 ======================================================================
Jan 27 03:49:13 FAIL [0.004s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 03:49:13 ----------------------------------------------------------------------
Jan 27 03:49:13 Traceback (most recent call last):
Jan 27 03:49:13   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 03:49:13     self.assertEqual(event.name, name_expected)
Jan 27 03:49:13   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 03:49:13     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 03:49:13 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 03:49:13 - aten::t
Jan 27 03:49:13 + aten::addmm
Jan 27 03:49:13  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 03:49:13 
Jan 27 03:49:14 ----------------------------------------------------------------------
Jan 27 03:49:14 Ran 2794 tests in 3599.913s
Jan 27 03:49:14 
Jan 27 03:49:14 FAILED (failures=4, skipped=19, expected failures=1)
Jan 27 03:49:14 
Jan 27 03:49:14 Generating XML reports...

1 failure not recognized by patterns:

Job	Step	Action
^{pytorch_linux_bionic_py3_8_gcc9_coverage_test1}	^{Run tests}	🔁 rerun

🚧 3 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 on Jan 26 from 1:29pm to 6:17pm (6f3aa58 - 5748410)
- 🔁 rerun
pytorch_linux_xenial_py3_6_gcc5_4_test on Jan 26 from 1:51pm to 5:48pm (f08464f - 5748410)
- 🔁 rerun
pytorch_linux_bionic_py3_6_clang9_test on Jan 26 from 1:29pm to 5:39pm (6f3aa58 - 5748410)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ezyang · 2021-01-27T14:26:07Z

                compute_dk = f"DispatchKey _dk = {dispatch_key};"
            return f"""\
 // aten::{f.func}
+C10_ALWAYS_INLINE


Are you sure this does anything? This is getting generated to a cpp file so it's hard for me to imagine that you're actually ever going to get to inline it.

Surprisingly, yes! It can get inlined into wrap_kernel_functor_unboxed_::call; the call chain is Library::impl -> CppFunction ctor -> KernelFunction::makeFromUnboxedFunction -> KernelFunction::makeFromUnboxedFunctor -> wrap_kernel_functor_unboxed. (My working theory is that the fragility of inlining decisions for dispatcher code and the high compile times for generated aten files like Functions.cpp is the huge amount of inlineable stuff going on between this path and the call out to Dispatcher::call{,WithDispatchKey}, repeated for every at:: function.)

bdhirsh · 2021-01-27T15:26:41Z

      "Parameter types mismatch");

-    static ReturnType call(OperatorKernel* functor, ParameterTypes... args) {
+    static ReturnType call(ParameterTypes... args, OperatorKernel* functor) {


More for my understanding than anything else- how does this help out the compiler?

Function arguments go in fixed places defined by the System V x86_64 ABI, at least on Linux & Mac. You can see Low Level System Interface > Function Calling Interface for the gory details (in which case you'll also want to refer to the Itanium C++ ABI to see how C++ things work), but roughly speaking and ignoring a bunch of subtleties and exceptions, the first 8 pointers' worth of arguments go in a fixed set of 8 registers. (I'll call them r0-r7 for simplicity, but those are definitely not their actual names on x86_64.) Supposing that functor takes at most 7 arguments, then with functor at the front, we have r0 = functor and r1-r7 = the first 7 args to functor. We then need to shift r1-r7 into r0-r6 before jumping to functor. With functor at the back, we have r0-r6 = the first 7 args to functor and r7 = functor, and we can jump straight to functor without moving more things around.

(If functor takes 8 arguments or more, then we still have the args to it in r0-r7 and the remaining arguments on the stack. Arguments that go on the stack are pushed from right to left and thus popped from left to right, so I haven't checked the assembly for this case but I think that we could just leave functor in place, load it from the stack and call it, and then pop it off the stack when functor returns.)

Splitting this out from #51164 (D26069629) to allow it to land separately; I'm sure this is a good idea but I'm less sure about #51164. Differential Revision: [D26112627](https://our.internmc.facebook.com/intern/diff/D26112627/) [ghstack-poisoned]

swolchok · 2021-01-27T23:14:08Z

After splitting out the C10_ALWAYS_INLINE part and re-measuring, I am seeing a small regression from this change. For the sake of being able to move forward, I am shelving it for now. I have split out #51245 containing the additions of C10_ALWAYS_INLINE.

Splitting this out from #51164 (D26069629) to allow it to land separately; I'm sure this is a good idea but I'm less sure about #51164. Differential Revision: [D26112627](https://our.internmc.facebook.com/intern/diff/D26112627/) [ghstack-poisoned]

Summary: Pull Request resolved: #51245 Splitting this out from #51164 (D26069629) to allow it to land separately; I'm sure this is a good idea but I'm less sure about #51164. ghstack-source-id: 120697499 Test Plan: double-check effect on empty benchmark with perf stat; didn't move Reviweers: ezyang, messmer Reviewed By: ezyang Differential Revision: D26112627 fbshipit-source-id: 50d4418d351527bcedd5ccdc49106bc642699870

…#51245) Summary: Pull Request resolved: pytorch#51245 Splitting this out from pytorch#51164 (D26069629) to allow it to land separately; I'm sure this is a good idea but I'm less sure about pytorch#51164. ghstack-source-id: 120697499 Test Plan: double-check effect on empty benchmark with perf stat; didn't move Reviweers: ezyang, messmer Reviewed By: ezyang Differential Revision: D26112627 fbshipit-source-id: 50d4418d351527bcedd5ccdc49106bc642699870

facebook-github-bot added the cla signed label Jan 27, 2021

This was referenced Jan 27, 2021

[PyTorch] Remove unnecessary dispatcher.h include in torch/library.h #51162

Closed

[PyTorch] Refactor Dispatcher to inline less code in fast path #51163

Closed

[PyTorch] Pass TensorOptions by value #51165

Closed

ezyang reviewed Jan 27, 2021

View reviewed changes

ezyang approved these changes Jan 27, 2021

View reviewed changes

bdhirsh reviewed Jan 27, 2021

View reviewed changes

bdhirsh mentioned this pull request Jan 27, 2021

inlining tensor.device() #50848

Closed

swolchok mentioned this pull request Jan 27, 2021

[PyTorch] Add C10_ALWAYS_INLINE to critical dispatcher paths #51245

Closed

swolchok closed this Jan 27, 2021

facebook-github-bot deleted the gh/swolchok/92/head branch February 27, 2021 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Put functor at the end of wrap_kernel_functor_unboxed params#51164

[PyTorch] Put functor at the end of wrap_kernel_functor_unboxed params#51164
swolchok wants to merge 1 commit intogh/swolchok/92/basefrom
gh/swolchok/92/head

swolchok commented Jan 27, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jan 27, 2021 •

edited

Loading

Uh oh!

ezyang Jan 27, 2021

Uh oh!

swolchok Jan 27, 2021

Uh oh!

bdhirsh Jan 27, 2021

Uh oh!

swolchok Jan 27, 2021

Uh oh!

swolchok commented Jan 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

swolchok commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 5 new failures recognized by patterns

pytorch_macos_10_13_py3_test (1/5)

pytorch_windows_vs2019_py36_cuda10.1_test2 (2/5)

pytorch_linux_xenial_py3_clang5_asan_test2 (3/5)

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/5)

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (5/5)

1 failure not recognized by patterns:

🚧 3 fixed upstream failures:

Uh oh!

ezyang Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

swolchok Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

bdhirsh Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

swolchok Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

swolchok commented Jan 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

swolchok commented Jan 27, 2021 •

edited

Loading

facebook-github-bot commented Jan 27, 2021 •

edited

Loading