Make add.Scalar manual_cpp_binding by ezyang · Pull Request #49203 · pytorch/pytorch

ezyang · 2020-12-10T23:27:54Z

Stack from ghstack:

Make add.Scalar manual_cpp_binding #49203 Make add.Scalar manual_cpp_binding
Add at::cpu namespace of functions for structured kernels #49505 Add at::cpu namespace of functions for structured kernels

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D25594995

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

facebook-github-bot · 2020-12-10T23:29:56Z

💊 CI failures summary and remediations

As of commit 4ce76d0 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 59 times.

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 2f3710a Pull Request resolved: #49203

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

ezyang · 2020-12-10T23:48:43Z

Looks like it doesn't work

======================================================================
ERROR: test_async_grad_guard_no_grad (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/users/ezyang/pytorch-tmp/test/jit/test_async.py", line 352, in test_async_grad_guard_no_grad
    @torch.jit.script
  File "/data/users/ezyang/pytorch-tmp/torch/jit/_script.py", line 940, in script
    qualified_name, ast, _rcb, get_default_args(obj)
RuntimeError: 
Return value was annotated as having type Tensor but is actually of type float:
  File "<string>", line 5
  return b * a
def add(a : float, b : Tensor) -> Tensor:
  return b + a
  ~~~~~~~~~~~~ <--- HERE
def ne(a : float, b : Tensor) -> Tensor:
  return b != a
'add' is being compiled since it was called from 'foo'
  File "/data/users/ezyang/pytorch-tmp/test/jit/test_async.py", line 354
        @torch.jit.script
        def foo(x):
            y = x * 2
                    ~ <--- HERE
            return y.requires_grad


----------------------------------------------------------------------
Ran 2 tests in 0.071s

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

ezyang · 2020-12-11T00:07:29Z

Updated with a different approach. One downside here is that if you hit the native:: kernel you're still going to end up doing two dispatches. Might need to do some tricky codegen stuff to get around that problem.

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 5fc5408 Pull Request resolved: #49203

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: af46773 Pull Request resolved: #49203

robieta

This looks reasonable. What is the plan as more kernels become structured? (manually do this, teach the codegen, etc.)

Also, can you benchmark the change? Skipping a dispatch should be a strict improvement, but we should get in the habit of confirming.

bdhirsh · 2020-12-15T18:38:14Z

One downside here is that if you hit the native:: kernel you're still going to end up doing two dispatches. Might need to do some tricky codegen stuff to get around that problem.

Is the two dispatches here referring to the fact that you have one dispatch for at::scalar_tensor (to convert the scalar) and another for at::add(Tensor, Tensor)? Or are you referring to something else (since that doesn't seem like something you can solve in codegen)?

ezyang · 2020-12-15T19:12:23Z

Is the two dispatches here referring to the fact that you have one dispatch for at::scalar_tensor (to convert the scalar) and another for at::add(Tensor, Tensor)? Or are you referring to something else (since that doesn't seem like something you can solve in codegen)?

Nope, it's the extra dispatch from at::add(Tensor, Tensor). (In the old version of this code, predating the parent of commit of this PR, this was an at::native::add call, so no cost.)

ezyang · 2020-12-15T20:20:41Z

What is the plan as more kernels become structured? (manually do this, teach the codegen, etc.)

No plan yet. This diff is pretty short and so in the mid term I might just do all of these kernels this way.

ezyang · 2020-12-16T19:26:27Z

Benchmark:

Control:

import torch
from torch.utils.benchmark import Timer

c = Timer(
    stmt='a + a',
    setup=f'a = torch.rand(3, 4)'
).collect_callgrind(number=100, collect_baseline=False)
print(c.counts(denoise=True))

Before: 849190
After: 848999

(This experiment is expected to show no diff.)

Python scalar add stmt='a + 2'

Before: 1986043
After: 1979842

(This experiment is expected to show no diff, as wrapping is supposed to happen in Python bindings bypassing relevant C++)

C++ scalar add (expected to exercise)

    stmt='a + 2;',
    setup='auto a = torch::randn({3, 4});',
    language='cpp'

Before: 1726143
After: 1689743

Example from benchmark suite

    stmt='a += 1',
    setup='a = torch.rand(3, 4)'

Before: 1751566
After: 1751262

ezyang · 2020-12-16T19:28:39Z

@robieta This diff should be landed, but it didn't actually improve a += 1 which suggests something else is going on in the regression in #48718 (actually, this is consistent with sum also regressing, where there wasn't an obvious code impact there)

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 2665c12 Pull Request resolved: #49203

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995) [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 4dafba6 Pull Request resolved: #49203

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995) [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: fb705e4 Pull Request resolved: #49203

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995) [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 3c13d75 Pull Request resolved: #49203

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995) [ghstack-poisoned]

The comment on native_functions.yaml claims this is just to solve a C++ API problem. If this is true, we don't actually need this in schema, and all I need to do is just manually replicate the API in the C++ bindings. Seeing if this indeed works. If this does work, it solves a performance problem on add.Scalar where I introduced an extra redispatch. If it doesn't work I'll try something else. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 3bcda2f Pull Request resolved: #49203

ezyang · 2021-01-22T18:57:31Z

On further reflection, if this isn't fixing a concrete performance problem (which it isn't), I'd rather not copy paste wrapped_scalar_tensor everywhere as it would impede further improvements cited by #49758

If we do #50953 that would also solve these problems, with less code duplication. So I think this is not worth it for now.

ezyang mentioned this pull request Dec 10, 2020

Add manual_cpp_binding to native_functions.yaml #49092

Closed

facebook-github-bot added the cla signed label Dec 10, 2020

ezyang requested a review from albanD as a code owner December 10, 2020 23:56

ezyang changed the title ~~Delete add.Scalar from schema~~ Make add.Scalar manual_cpp_binding Dec 11, 2020

ezyang requested review from robieta and swolchok December 11, 2020 00:06

Update on "Make add.Scalar manual_cpp_binding"

c7fa026

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

Update on "Make add.Scalar manual_cpp_binding"

8aeee1e

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

ezyang requested a review from bdhirsh December 15, 2020 14:46

robieta approved these changes Dec 15, 2020

View reviewed changes

Update on "Make add.Scalar manual_cpp_binding"

82d4e3a

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

ezyang mentioned this pull request Jan 6, 2021

Add at::cpu namespace of functions for structured kernels #49505

Closed

ezyang mentioned this pull request Jan 26, 2021

Scalar/Tensor arg type for op schemas #50987

Open

ezyang closed this Feb 9, 2021

facebook-github-bot deleted the gh/ezyang/891/head branch March 12, 2021 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make add.Scalar manual_cpp_binding#49203

Make add.Scalar manual_cpp_binding#49203
ezyang wants to merge 12 commits intogh/ezyang/891/basefrom
gh/ezyang/891/head

ezyang commented Dec 10, 2020 •

edited

Loading

Uh oh!

facebook-github-bot commented Dec 10, 2020 •

edited

Loading

Uh oh!

ezyang commented Dec 10, 2020

Uh oh!

ezyang commented Dec 11, 2020

Uh oh!

robieta left a comment

Uh oh!

bdhirsh commented Dec 15, 2020

Uh oh!

ezyang commented Dec 15, 2020

Uh oh!

ezyang commented Dec 15, 2020

Uh oh!

ezyang commented Dec 16, 2020

Uh oh!

ezyang commented Dec 16, 2020

Uh oh!

ezyang commented Jan 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ezyang commented Dec 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

ezyang commented Dec 10, 2020

Uh oh!

ezyang commented Dec 11, 2020

Uh oh!

robieta left a comment

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Dec 15, 2020

Uh oh!

ezyang commented Dec 15, 2020

Uh oh!

ezyang commented Dec 15, 2020

Uh oh!

ezyang commented Dec 16, 2020

Uh oh!

ezyang commented Dec 16, 2020

Uh oh!

ezyang commented Jan 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ezyang commented Dec 10, 2020 •

edited

Loading

facebook-github-bot commented Dec 10, 2020 •

edited

Loading