Skip to content

Port equal from THC to ATen (CUDA)#36483

Closed
Baranowski wants to merge 1 commit intopytorch:masterfrom
Quansight:wbaranowski-equal-24557
Closed

Port equal from THC to ATen (CUDA)#36483
Baranowski wants to merge 1 commit intopytorch:masterfrom
Quansight:wbaranowski-equal-24557

Conversation

@Baranowski
Copy link
Copy Markdown
Contributor

@Baranowski Baranowski commented Apr 13, 2020

Fixes #24557

ASV benchmark:

import torch

sizes = [
    (10**6,),
    (1000, 1000),
    (10, 10),
    (1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
]

class EqualTrue:
    params = range(len(sizes))

    def setup(self, n):
        dims = sizes[n]
        self.a = torch.rand(dims, device='cuda')
        self.b = self.a.clone()

    def time_equal(self, n):
        torch.equal(self.a, self.b)

class EqualFalse:
    params = range(len(sizes))

    def setup(self, n):
        dims = sizes[n]
        self.a = torch.rand(dims, device='cuda')
        self.b = torch.rand(dims, device='cuda')

    def time_equal(self, n):
        torch.equal(self.a, self.b)

Old results:

[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
               param1
              -------- ------------
                 0       67.7±7μs
                 1       74.0±2μs
                 2      24.4±0.1μs
                 3      135±0.2μs
              ======== ============

[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
               param1
              -------- ------------
                 0      59.8±0.2μs
                 1      59.9±0.3μs
                 2      25.0±0.5μs
                 3      136±0.2μs
              ======== ============

New results:

[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
               param1              
              -------- ------------
                 0      44.4±0.2μs 
                 1      44.5±0.4μs 
                 2      31.3±0.3μs 
                 3      96.6±0.5μs 
              ======== ============

[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
               param1              
              -------- ------------
                 0      44.2±0.2μs 
                 1      44.6±0.2μs 
                 2      30.8±0.3μs 
                 3      97.3±0.2μs 
              ======== ============

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Apr 13, 2020

💊 CI failures summary and remediations

As of commit b80a23b (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 24 times.

@Baranowski
Copy link
Copy Markdown
Contributor Author

I don't see anything for equal in the benchmarks/ folder. Any suggestions on what usage patterns I should benchmark for?

@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 18, 2020
@Baranowski
Copy link
Copy Markdown
Contributor Author

@VitalyFedyunin should I benchmark this change?

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

Hi! Yes, simple timeit benchmark with different tensor shapes(sizes) will suffice.

VitalyFedyunin
VitalyFedyunin previously approved these changes Apr 21, 2020
@Baranowski
Copy link
Copy Markdown
Contributor Author

I wasn't expecting this in such a simple change but I'm seeing a perf regression. I will need to dig into this.

@VitalyFedyunin VitalyFedyunin dismissed their stale review April 22, 2020 17:43

Please ping me when you are ready for next review.

@Baranowski Baranowski force-pushed the wbaranowski-equal-24557 branch from 7973bd6 to df748f0 Compare April 22, 2020 18:48
Copy link
Copy Markdown
Contributor Author

@Baranowski Baranowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin this is ready for a review.

I have put the benchmark results and the benchmark itself in the PR description. Turns out that wrapping the value in a scalar_tensor and then extracting it from there was taking too long.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that calling this function from ATen (as opposed to TH) is ok.

Copy link
Copy Markdown
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but you need to rebase

@Baranowski Baranowski force-pushed the wbaranowski-equal-24557 branch from df748f0 to 18fd4ea Compare April 30, 2020 07:09
@Baranowski
Copy link
Copy Markdown
Contributor Author

@VitalyFedyunin this is rebased and should be ready to be merged.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safer if you call: at::native::eq(self, src).min()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That leads to serious performance regression.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for going circles but can you please check with at::native::eq(self, src).all() https://pytorch.org/docs/master/tensors.html#torch.BoolTensor.all

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm shocked. Not only is there no regression, it seems to be significantly faster now. (Updated the benchmarks in PR message)

@Baranowski Baranowski force-pushed the wbaranowski-equal-24557 branch from 81237c6 to a86fbf4 Compare May 13, 2020 19:26
Comment thread aten/src/ATen/native/cuda/ReduceLogicKernel.cu Outdated
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

Sorry this one got lost in notifications. Can you please rebase so I can land.

@Baranowski Baranowski force-pushed the wbaranowski-equal-24557 branch from e414a86 to f8d1304 Compare June 23, 2020 05:55
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@Baranowski Baranowski force-pushed the wbaranowski-equal-24557 branch from f8d1304 to b80a23b Compare July 2, 2020 06:48
@Baranowski
Copy link
Copy Markdown
Contributor Author

@VitalyFedyunin I have rebased again. This is ready for merge

@Baranowski
Copy link
Copy Markdown
Contributor Author

Just a heads up: I won't have the time to work on this after two weeks from now. I think this should be ready to merge.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@VitalyFedyunin merged this pull request in a780244.

csarofeen pushed a commit to csarofeen/pytorch that referenced this pull request Jul 7, 2020
Summary:
Fixes pytorch#24557

ASV benchmark:

```
import torch

sizes = [
    (10**6,),
    (1000, 1000),
    (10, 10),
    (1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
]

class EqualTrue:
    params = range(len(sizes))

    def setup(self, n):
        dims = sizes[n]
        self.a = torch.rand(dims, device='cuda')
        self.b = self.a.clone()

    def time_equal(self, n):
        torch.equal(self.a, self.b)

class EqualFalse:
    params = range(len(sizes))

    def setup(self, n):
        dims = sizes[n]
        self.a = torch.rand(dims, device='cuda')
        self.b = torch.rand(dims, device='cuda')

    def time_equal(self, n):
        torch.equal(self.a, self.b)
```

Old results:
```
[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
               param1
              -------- ------------
                 0       67.7±7μs
                 1       74.0±2μs
                 2      24.4±0.1μs
                 3      135±0.2μs
              ======== ============

[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
               param1
              -------- ------------
                 0      59.8±0.2μs
                 1      59.9±0.3μs
                 2      25.0±0.5μs
                 3      136±0.2μs
              ======== ============
```

New results:
```
[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
               param1
              -------- ------------
                 0      44.4±0.2μs
                 1      44.5±0.4μs
                 2      31.3±0.3μs
                 3      96.6±0.5μs
              ======== ============

[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
               param1
              -------- ------------
                 0      44.2±0.2μs
                 1      44.6±0.2μs
                 2      30.8±0.3μs
                 3      97.3±0.2μs
              ======== ============
```

Pull Request resolved: pytorch#36483

Differential Revision: D21451829

Pulled By: VitalyFedyunin

fbshipit-source-id: 033e8060192c54f139310aeafe8ba784bab94ded
csarofeen added a commit to csarofeen/pytorch that referenced this pull request Aug 16, 2020
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Fixes pytorch#24557

ASV benchmark:

```
import torch

sizes = [
    (10**6,),
    (1000, 1000),
    (10, 10),
    (1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
]

class EqualTrue:
    params = range(len(sizes))

    def setup(self, n):
        dims = sizes[n]
        self.a = torch.rand(dims, device='cuda')
        self.b = self.a.clone()

    def time_equal(self, n):
        torch.equal(self.a, self.b)

class EqualFalse:
    params = range(len(sizes))

    def setup(self, n):
        dims = sizes[n]
        self.a = torch.rand(dims, device='cuda')
        self.b = torch.rand(dims, device='cuda')

    def time_equal(self, n):
        torch.equal(self.a, self.b)
```

Old results:
```
[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
               param1
              -------- ------------
                 0       67.7±7μs
                 1       74.0±2μs
                 2      24.4±0.1μs
                 3      135±0.2μs
              ======== ============

[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
               param1
              -------- ------------
                 0      59.8±0.2μs
                 1      59.9±0.3μs
                 2      25.0±0.5μs
                 3      136±0.2μs
              ======== ============
```

New results:
```
[ 75.00%] ··· equal.EqualFalse.time_equal
[ 75.00%] ··· ======== ============
               param1
              -------- ------------
                 0      44.4±0.2μs
                 1      44.5±0.4μs
                 2      31.3±0.3μs
                 3      96.6±0.5μs
              ======== ============

[100.00%] ··· equal.EqualTrue.time_equal
[100.00%] ··· ======== ============
               param1
              -------- ------------
                 0      44.2±0.2μs
                 1      44.6±0.2μs
                 2      30.8±0.3μs
                 3      97.3±0.2μs
              ======== ============
```

Pull Request resolved: pytorch#36483

Differential Revision: D21451829

Pulled By: VitalyFedyunin

fbshipit-source-id: 033e8060192c54f139310aeafe8ba784bab94ded
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate equal from the TH to Aten (CUDA)

5 participants