Port equal from THC to ATen (CUDA)#36483
Conversation
💊 CI failures summary and remediationsAs of commit b80a23b (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 24 times. |
|
I don't see anything for |
|
@VitalyFedyunin should I benchmark this change? |
|
Hi! Yes, simple timeit benchmark with different tensor shapes(sizes) will suffice. |
|
I wasn't expecting this in such a simple change but I'm seeing a perf regression. I will need to dig into this. |
Please ping me when you are ready for next review.
7973bd6 to
df748f0
Compare
Baranowski
left a comment
There was a problem hiding this comment.
@VitalyFedyunin this is ready for a review.
I have put the benchmark results and the benchmark itself in the PR description. Turns out that wrapping the value in a scalar_tensor and then extracting it from there was taking too long.
There was a problem hiding this comment.
I'm not sure that calling this function from ATen (as opposed to TH) is ok.
VitalyFedyunin
left a comment
There was a problem hiding this comment.
Code looks good, but you need to rebase
df748f0 to
18fd4ea
Compare
|
@VitalyFedyunin this is rebased and should be ready to be merged. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
It is safer if you call: at::native::eq(self, src).min()
There was a problem hiding this comment.
That leads to serious performance regression.
There was a problem hiding this comment.
Sorry for going circles but can you please check with at::native::eq(self, src).all() https://pytorch.org/docs/master/tensors.html#torch.BoolTensor.all
There was a problem hiding this comment.
I'm shocked. Not only is there no regression, it seems to be significantly faster now. (Updated the benchmarks in PR message)
81237c6 to
a86fbf4
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Sorry this one got lost in notifications. Can you please rebase so I can land. |
e414a86 to
f8d1304
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
f8d1304 to
b80a23b
Compare
|
@VitalyFedyunin I have rebased again. This is ready for merge |
|
Just a heads up: I won't have the time to work on this after two weeks from now. I think this should be ready to merge. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@VitalyFedyunin merged this pull request in a780244. |
Summary: Fixes pytorch#24557 ASV benchmark: ``` import torch sizes = [ (10**6,), (1000, 1000), (10, 10), (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ] class EqualTrue: params = range(len(sizes)) def setup(self, n): dims = sizes[n] self.a = torch.rand(dims, device='cuda') self.b = self.a.clone() def time_equal(self, n): torch.equal(self.a, self.b) class EqualFalse: params = range(len(sizes)) def setup(self, n): dims = sizes[n] self.a = torch.rand(dims, device='cuda') self.b = torch.rand(dims, device='cuda') def time_equal(self, n): torch.equal(self.a, self.b) ``` Old results: ``` [ 75.00%] ··· equal.EqualFalse.time_equal [ 75.00%] ··· ======== ============ param1 -------- ------------ 0 67.7±7μs 1 74.0±2μs 2 24.4±0.1μs 3 135±0.2μs ======== ============ [100.00%] ··· equal.EqualTrue.time_equal [100.00%] ··· ======== ============ param1 -------- ------------ 0 59.8±0.2μs 1 59.9±0.3μs 2 25.0±0.5μs 3 136±0.2μs ======== ============ ``` New results: ``` [ 75.00%] ··· equal.EqualFalse.time_equal [ 75.00%] ··· ======== ============ param1 -------- ------------ 0 44.4±0.2μs 1 44.5±0.4μs 2 31.3±0.3μs 3 96.6±0.5μs ======== ============ [100.00%] ··· equal.EqualTrue.time_equal [100.00%] ··· ======== ============ param1 -------- ------------ 0 44.2±0.2μs 1 44.6±0.2μs 2 30.8±0.3μs 3 97.3±0.2μs ======== ============ ``` Pull Request resolved: pytorch#36483 Differential Revision: D21451829 Pulled By: VitalyFedyunin fbshipit-source-id: 033e8060192c54f139310aeafe8ba784bab94ded
This reverts commit ca91846.
Summary: Fixes pytorch#24557 ASV benchmark: ``` import torch sizes = [ (10**6,), (1000, 1000), (10, 10), (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ] class EqualTrue: params = range(len(sizes)) def setup(self, n): dims = sizes[n] self.a = torch.rand(dims, device='cuda') self.b = self.a.clone() def time_equal(self, n): torch.equal(self.a, self.b) class EqualFalse: params = range(len(sizes)) def setup(self, n): dims = sizes[n] self.a = torch.rand(dims, device='cuda') self.b = torch.rand(dims, device='cuda') def time_equal(self, n): torch.equal(self.a, self.b) ``` Old results: ``` [ 75.00%] ··· equal.EqualFalse.time_equal [ 75.00%] ··· ======== ============ param1 -------- ------------ 0 67.7±7μs 1 74.0±2μs 2 24.4±0.1μs 3 135±0.2μs ======== ============ [100.00%] ··· equal.EqualTrue.time_equal [100.00%] ··· ======== ============ param1 -------- ------------ 0 59.8±0.2μs 1 59.9±0.3μs 2 25.0±0.5μs 3 136±0.2μs ======== ============ ``` New results: ``` [ 75.00%] ··· equal.EqualFalse.time_equal [ 75.00%] ··· ======== ============ param1 -------- ------------ 0 44.4±0.2μs 1 44.5±0.4μs 2 31.3±0.3μs 3 96.6±0.5μs ======== ============ [100.00%] ··· equal.EqualTrue.time_equal [100.00%] ··· ======== ============ param1 -------- ------------ 0 44.2±0.2μs 1 44.6±0.2μs 2 30.8±0.3μs 3 97.3±0.2μs ======== ============ ``` Pull Request resolved: pytorch#36483 Differential Revision: D21451829 Pulled By: VitalyFedyunin fbshipit-source-id: 033e8060192c54f139310aeafe8ba784bab94ded
Fixes #24557
ASV benchmark:
Old results:
New results: