Skip to content

[PyTorch] Put functor at the end of wrap_kernel_functor_unboxed params#51164

Closed
swolchok wants to merge 1 commit intogh/swolchok/92/basefrom
gh/swolchok/92/head
Closed

[PyTorch] Put functor at the end of wrap_kernel_functor_unboxed params#51164
swolchok wants to merge 1 commit intogh/swolchok/92/basefrom
gh/swolchok/92/head

Conversation

@swolchok
Copy link
Copy Markdown
Contributor

@swolchok swolchok commented Jan 27, 2021

Stack from ghstack:

In the (hopefully) common case where the unboxed functor is
really a c10::CompileTimeFunctionPointer, this enables a jump
straight to said functor.

Note that I had no end of trouble getting this to have decent
performance without the previous diff, and even with it, all the
manual C10_ALWAYS_INLINE I added were necessary.

Differential Revision: D26069629

In the (hopefully) common case where the unboxed functor is
really a `c10::CompileTimeFunctionPointer`, this enables a jump
straight to said functor.

Note that I had no end of trouble getting this to have decent
performance without the previous diff, and even with it, all the
manual C10_ALWAYS_INLINE I added were necessary.

Differential Revision: [D26069629](https://our.internmc.facebook.com/intern/diff/D26069629/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jan 27, 2021

💊 CI failures summary and remediations

As of commit a5bb6dd (more details on the Dr. CI page):


  • 7/10 failures possibly* introduced in this PR
    • 1/7 non-CircleCI failure(s)
  • 3/10 broken upstream at merge base b678fdf on Jan 26 from 1:29pm to 6:17pm

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/5)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jan 27 01:54:15 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 01:54:15 AssertionError: True is not false
Jan 27 01:54:15 
Jan 27 01:54:15 ======================================================================
Jan 27 01:54:15 FAIL [0.006s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 01:54:15 ----------------------------------------------------------------------
Jan 27 01:54:15 Traceback (most recent call last):
Jan 27 01:54:15   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 01:54:15     self.assertEqual(event.name, name_expected)
Jan 27 01:54:15   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 01:54:15     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 01:54:15 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 01:54:15 - aten::t
Jan 27 01:54:15 + aten::addmm
Jan 27 01:54:15  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 01:54:15 
Jan 27 01:54:15 ----------------------------------------------------------------------
Jan 27 01:54:15 Ran 1520 tests in 628.955s
Jan 27 01:54:15 
Jan 27 01:54:15 FAILED (failures=4, skipped=17, expected failures=1)
Jan 27 01:54:15 
Jan 27 01:54:15 Generating XML reports...

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (2/5)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: 'aten::t' != 'aten::addmm'
AssertionError: True is not false

======================================================================
FAIL [0.008s]: test_profiler_shapes (__main__.TestAutograd)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_autograd.py", line 3233, in test_profiler_shapes
    self.assertEqual(event.name, name_expected)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1169, in assertEqual
    super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: 'aten::t' != 'aten::addmm'
- aten::t
+ aten::addmm
 : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.

----------------------------------------------------------------------
Ran 2794 tests in 2068.898s

FAILED (failures=4, skipped=23, expected failures=1)

Generating XML reports...

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (3/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 27 02:54:13 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 02:54:13 AssertionError: True is not false
Jan 27 02:54:13 
Jan 27 02:54:13 ======================================================================
Jan 27 02:54:13 FAIL [0.011s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 02:54:13 ----------------------------------------------------------------------
Jan 27 02:54:13 Traceback (most recent call last):
Jan 27 02:54:13   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 02:54:13     self.assertEqual(event.name, name_expected)
Jan 27 02:54:13   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 02:54:13     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 02:54:13 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 02:54:13 - aten::t
Jan 27 02:54:13 + aten::addmm
Jan 27 02:54:13  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 02:54:13 
Jan 27 02:54:13 ----------------------------------------------------------------------
Jan 27 02:54:13 Ran 1520 tests in 2273.470s
Jan 27 02:54:13 
Jan 27 02:54:13 FAILED (failures=4, skipped=16, expected failures=1)
Jan 27 02:54:13 
Jan 27 02:54:13 Generating XML reports...

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 27 02:38:22 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 02:38:22 AssertionError: True is not false
Jan 27 02:38:22 
Jan 27 02:38:22 ======================================================================
Jan 27 02:38:22 FAIL [0.013s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 02:38:22 ----------------------------------------------------------------------
Jan 27 02:38:22 Traceback (most recent call last):
Jan 27 02:38:22   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 02:38:22     self.assertEqual(event.name, name_expected)
Jan 27 02:38:22   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 02:38:22     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 02:38:22 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 02:38:22 - aten::t
Jan 27 02:38:22 + aten::addmm
Jan 27 02:38:22  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 02:38:22 
Jan 27 02:38:23 ----------------------------------------------------------------------
Jan 27 02:38:23 Ran 1520 tests in 1044.614s
Jan 27 02:38:23 
Jan 27 02:38:23 FAILED (failures=4, skipped=16, expected failures=1)
Jan 27 02:38:23 
Jan 27 02:38:23 Generating XML reports...

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (5/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 27 03:49:13 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 03:49:13 AssertionError: True is not false
Jan 27 03:49:13 
Jan 27 03:49:13 ======================================================================
Jan 27 03:49:13 FAIL [0.004s]: test_profiler_shapes (__main__.TestAutograd)
Jan 27 03:49:13 ----------------------------------------------------------------------
Jan 27 03:49:13 Traceback (most recent call last):
Jan 27 03:49:13   File "test_autograd.py", line 3233, in test_profiler_shapes
Jan 27 03:49:13     self.assertEqual(event.name, name_expected)
Jan 27 03:49:13   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1169, in assertEqual
Jan 27 03:49:13     super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jan 27 03:49:13 AssertionError: 'aten::t' != 'aten::addmm'
Jan 27 03:49:13 - aten::t
Jan 27 03:49:13 + aten::addmm
Jan 27 03:49:13  : Attempted to compare [string] types: Expected: 'aten::t'; Actual: 'aten::addmm'.
Jan 27 03:49:13 
Jan 27 03:49:14 ----------------------------------------------------------------------
Jan 27 03:49:14 Ran 2794 tests in 3599.913s
Jan 27 03:49:14 
Jan 27 03:49:14 FAILED (failures=4, skipped=19, expected failures=1)
Jan 27 03:49:14 
Jan 27 03:49:14 Generating XML reports...

1 failure not recognized by patterns:

Job Step Action
CircleCI pytorch_linux_bionic_py3_8_gcc9_coverage_test1 Run tests 🔁 rerun

🚧 3 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Comment thread tools/codegen/gen.py
compute_dk = f"DispatchKey _dk = {dispatch_key};"
return f"""\
// aten::{f.func}
C10_ALWAYS_INLINE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this does anything? This is getting generated to a cpp file so it's hard for me to imagine that you're actually ever going to get to inline it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly, yes! It can get inlined into wrap_kernel_functor_unboxed_::call; the call chain is Library::impl -> CppFunction ctor -> KernelFunction::makeFromUnboxedFunction -> KernelFunction::makeFromUnboxedFunctor -> wrap_kernel_functor_unboxed. (My working theory is that the fragility of inlining decisions for dispatcher code and the high compile times for generated aten files like Functions.cpp is the huge amount of inlineable stuff going on between this path and the call out to Dispatcher::call{,WithDispatchKey}, repeated for every at:: function.)

"Parameter types mismatch");

static ReturnType call(OperatorKernel* functor, ParameterTypes... args) {
static ReturnType call(ParameterTypes... args, OperatorKernel* functor) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More for my understanding than anything else- how does this help out the compiler?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function arguments go in fixed places defined by the System V x86_64 ABI, at least on Linux & Mac. You can see Low Level System Interface > Function Calling Interface for the gory details (in which case you'll also want to refer to the Itanium C++ ABI to see how C++ things work), but roughly speaking and ignoring a bunch of subtleties and exceptions, the first 8 pointers' worth of arguments go in a fixed set of 8 registers. (I'll call them r0-r7 for simplicity, but those are definitely not their actual names on x86_64.) Supposing that functor takes at most 7 arguments, then with functor at the front, we have r0 = functor and r1-r7 = the first 7 args to functor. We then need to shift r1-r7 into r0-r6 before jumping to functor. With functor at the back, we have r0-r6 = the first 7 args to functor and r7 = functor, and we can jump straight to functor without moving more things around.

(If functor takes 8 arguments or more, then we still have the args to it in r0-r7 and the remaining arguments on the stack. Arguments that go on the stack are pushed from right to left and thus popped from left to right, so I haven't checked the assembly for this case but I think that we could just leave functor in place, load it from the stack and call it, and then pop it off the stack when functor returns.)

swolchok added a commit that referenced this pull request Jan 27, 2021
Splitting this out from #51164 (D26069629) to allow it to
land separately; I'm sure this is a good idea but I'm less sure about
#51164.

Differential Revision: [D26112627](https://our.internmc.facebook.com/intern/diff/D26112627/)

[ghstack-poisoned]
@swolchok
Copy link
Copy Markdown
Contributor Author

After splitting out the C10_ALWAYS_INLINE part and re-measuring, I am seeing a small regression from this change. For the sake of being able to move forward, I am shelving it for now. I have split out #51245 containing the additions of C10_ALWAYS_INLINE.

@swolchok swolchok closed this Jan 27, 2021
swolchok added a commit that referenced this pull request Jan 29, 2021
Splitting this out from #51164 (D26069629) to allow it to
land separately; I'm sure this is a good idea but I'm less sure about
#51164.

Differential Revision: [D26112627](https://our.internmc.facebook.com/intern/diff/D26112627/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Jan 29, 2021
Splitting this out from #51164 (D26069629) to allow it to
land separately; I'm sure this is a good idea but I'm less sure about
#51164.

Differential Revision: [D26112627](https://our.internmc.facebook.com/intern/diff/D26112627/)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Feb 1, 2021
Summary:
Pull Request resolved: #51245

Splitting this out from #51164 (D26069629) to allow it to
land separately; I'm sure this is a good idea but I'm less sure about
#51164.
ghstack-source-id: 120697499

Test Plan:
double-check effect on empty benchmark with perf stat;
didn't move

Reviweers: ezyang, messmer

Reviewed By: ezyang

Differential Revision: D26112627

fbshipit-source-id: 50d4418d351527bcedd5ccdc49106bc642699870
@facebook-github-bot facebook-github-bot deleted the gh/swolchok/92/head branch February 27, 2021 15:16
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…#51245)

Summary:
Pull Request resolved: pytorch#51245

Splitting this out from pytorch#51164 (D26069629) to allow it to
land separately; I'm sure this is a good idea but I'm less sure about
pytorch#51164.
ghstack-source-id: 120697499

Test Plan:
double-check effect on empty benchmark with perf stat;
didn't move

Reviweers: ezyang, messmer

Reviewed By: ezyang

Differential Revision: D26112627

fbshipit-source-id: 50d4418d351527bcedd5ccdc49106bc642699870
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants