Skip to content

Make add.Scalar manual_cpp_binding#49203

Closed
ezyang wants to merge 12 commits intogh/ezyang/891/basefrom
gh/ezyang/891/head
Closed

Make add.Scalar manual_cpp_binding#49203
ezyang wants to merge 12 commits intogh/ezyang/891/basefrom
gh/ezyang/891/head

Conversation

@ezyang
Copy link
Copy Markdown
Contributor

@ezyang ezyang commented Dec 10, 2020

Stack from ghstack:

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D25594995

The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Dec 10, 2020

💊 CI failures summary and remediations

As of commit 4ce76d0 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 59 times.

The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 10, 2020
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 2f3710a
Pull Request resolved: #49203
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Dec 10, 2020

Looks like it doesn't work

======================================================================
ERROR: test_async_grad_guard_no_grad (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/users/ezyang/pytorch-tmp/test/jit/test_async.py", line 352, in test_async_grad_guard_no_grad
    @torch.jit.script
  File "/data/users/ezyang/pytorch-tmp/torch/jit/_script.py", line 940, in script
    qualified_name, ast, _rcb, get_default_args(obj)
RuntimeError: 
Return value was annotated as having type Tensor but is actually of type float:
  File "<string>", line 5
  return b * a
def add(a : float, b : Tensor) -> Tensor:
  return b + a
  ~~~~~~~~~~~~ <--- HERE
def ne(a : float, b : Tensor) -> Tensor:
  return b != a
'add' is being compiled since it was called from 'foo'
  File "/data/users/ezyang/pytorch-tmp/test/jit/test_async.py", line 354
        @torch.jit.script
        def foo(x):
            y = x * 2
                    ~ <--- HERE
            return y.requires_grad


----------------------------------------------------------------------
Ran 2 tests in 0.071s

The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@ezyang ezyang requested a review from albanD as a code owner December 10, 2020 23:56
@ezyang ezyang changed the title Delete add.Scalar from schema Make add.Scalar manual_cpp_binding Dec 11, 2020
@ezyang ezyang requested review from robieta and swolchok December 11, 2020 00:06
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Dec 11, 2020

Updated with a different approach. One downside here is that if you hit the native:: kernel you're still going to end up doing two dispatches. Might need to do some tricky codegen stuff to get around that problem.

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 11, 2020
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 5fc5408
Pull Request resolved: #49203
Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 14, 2020
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: af46773
Pull Request resolved: #49203
@ezyang ezyang requested a review from bdhirsh December 15, 2020 14:46
Copy link
Copy Markdown
Contributor

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable. What is the plan as more kernels become structured? (manually do this, teach the codegen, etc.)

Also, can you benchmark the change? Skipping a dispatch should be a strict improvement, but we should get in the habit of confirming.

@bdhirsh
Copy link
Copy Markdown
Collaborator

bdhirsh commented Dec 15, 2020

One downside here is that if you hit the native:: kernel you're still going to end up doing two dispatches. Might need to do some tricky codegen stuff to get around that problem.

Is the two dispatches here referring to the fact that you have one dispatch for at::scalar_tensor (to convert the scalar) and another for at::add(Tensor, Tensor)? Or are you referring to something else (since that doesn't seem like something you can solve in codegen)?

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Dec 15, 2020

Is the two dispatches here referring to the fact that you have one dispatch for at::scalar_tensor (to convert the scalar) and another for at::add(Tensor, Tensor)? Or are you referring to something else (since that doesn't seem like something you can solve in codegen)?

Nope, it's the extra dispatch from at::add(Tensor, Tensor). (In the old version of this code, predating the parent of commit of this PR, this was an at::native::add call, so no cost.)

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Dec 15, 2020

What is the plan as more kernels become structured? (manually do this, teach the codegen, etc.)

No plan yet. This diff is pretty short and so in the mid term I might just do all of these kernels this way.

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Dec 16, 2020

Benchmark:

Control:

import torch
from torch.utils.benchmark import Timer

c = Timer(
    stmt='a + a',
    setup=f'a = torch.rand(3, 4)'
).collect_callgrind(number=100, collect_baseline=False)
print(c.counts(denoise=True))

Before: 849190
After: 848999

(This experiment is expected to show no diff.)

Python scalar add stmt='a + 2'

Before: 1986043
After: 1979842

(This experiment is expected to show no diff, as wrapping is supposed to happen in Python bindings bypassing relevant C++)

C++ scalar add (expected to exercise)

    stmt='a + 2;',
    setup='auto a = torch::randn({3, 4});',
    language='cpp'

Before: 1726143
After: 1689743

Example from benchmark suite

    stmt='a += 1',
    setup='a = torch.rand(3, 4)'

Before: 1751566
After: 1751262

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Dec 16, 2020

@robieta This diff should be landed, but it didn't actually improve a += 1 which suggests something else is going on in the regression in #48718 (actually, this is consistent with sum also regressing, where there wasn't an obvious code impact there)

Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 16, 2020
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 2665c12
Pull Request resolved: #49203
Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 6, 2021
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 4dafba6
Pull Request resolved: #49203
Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 6, 2021
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: fb705e4
Pull Request resolved: #49203
Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 7, 2021
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 3c13d75
Pull Request resolved: #49203
Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995)

[ghstack-poisoned]
Manually do add.Scalar binding, skipping dispatch when you call into it from C++. This doesn't help if you directly get to the kernel via JIT interpreter though.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D25594995](https://our.internmc.facebook.com/intern/diff/D25594995)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 19, 2021
The comment on native_functions.yaml claims this is just to
solve a C++ API problem.  If this is true, we don't actually
need this in schema, and all I need to do is just manually
replicate the API in the C++ bindings.  Seeing if this indeed
works.

If this does work, it solves a performance problem on add.Scalar
where I introduced an extra redispatch.  If it doesn't work I'll
try something else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 3bcda2f
Pull Request resolved: #49203
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Jan 22, 2021

On further reflection, if this isn't fixing a concrete performance problem (which it isn't), I'd rather not copy paste wrapped_scalar_tensor everywhere as it would impede further improvements cited by #49758

If we do #50953 that would also solve these problems, with less code duplication. So I think this is not worth it for now.

@ezyang ezyang closed this Feb 9, 2021
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/891/head branch March 12, 2021 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants