Skip to content

sum and roll on cuda for complex dtypes#37959

Closed
anjali411 wants to merge 3 commits intogh/anjali411/15/basefrom
gh/anjali411/15/head
Closed

sum and roll on cuda for complex dtypes#37959
anjali411 wants to merge 3 commits intogh/anjali411/15/basefrom
gh/anjali411/15/head

Conversation

@anjali411
Copy link
Copy Markdown
Contributor

@anjali411 anjali411 commented May 6, 2020

Stack from ghstack:

Resolves #37925

[ghstack-poisoned]
anjali411 added a commit that referenced this pull request May 6, 2020
ghstack-source-id: 6f3754f
Pull Request resolved: #37959
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented May 6, 2020

💊 CI failures summary and remediations

As of commit 97868ce (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: Not within tolerance rtol=0 atol=1e-05 at input[0, 2] (0.0 vs. -2.25) and 9 other locations (40.00%)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 974, in assertEqual 
    assertTensorsEqual(x, y) 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 934, in assertTensorsEqual 
    atol=atol, rtol=rtol, message=message) 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 974, in assertEqual 
    assertTensorsEqual(x, y) 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 936, in assertTensorsEqual 
    torch.testing.assert_allclose(a, b, atol=atol, rtol=rtol, equal_nan=True, msg=message) 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\__init__.py", line 60, in assert_allclose 
    raise AssertionError(msg) 
AssertionError: Not within tolerance rtol=0 atol=1e-05 at input[0, 2] (0.0 vs. -2.25) and 9 other locations (40.00%) 
 
---------------------------------------------------------------------- 
Ran 5195 tests in 473.005s 
 
FAILED (failures=5, skipped=207) 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestDevicePrecisionCUDA-20200507173325.xml 
Generated XML report: test-reports\python-unittest\TEST-TestTensorDeviceOpsCPU-20200507173325.xml 
Generated XML report: test-reports\python-unittest\TEST-TestTensorDeviceOpsCUDA-20200507173325.xml 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

May 07 17:25:09 ConnectionResetError: [Errno 104] Connection reset by peer
May 07 17:25:09   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 455, in accept 
May 07 17:25:09     deliver_challenge(c, self._authkey) 
May 07 17:25:09   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge 
May 07 17:25:09     response = connection.recv_bytes(256)        # reject large message 
May 07 17:25:09   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes 
May 07 17:25:09     buf = self._recv_bytes(maxlength) 
May 07 17:25:09   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes 
May 07 17:25:09     buf = self._recv(4) 
May 07 17:25:09   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv 
May 07 17:25:09     chunk = read(handle, remaining) 
May 07 17:25:09 ConnectionResetError: [Errno 104] Connection reset by peer 
May 07 17:25:09 /opt/conda/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown 
May 07 17:25:09   len(cache)) 
May 07 17:25:11 Process ErrorTrackingProcess-122: 
May 07 17:25:11 Traceback (most recent call last): 
May 07 17:25:11   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap 
May 07 17:25:11     self.run() 
May 07 17:25:11   File "/var/lib/jenkins/workspace/test/test_dataloader.py", line 362, in run 
May 07 17:25:11     super(ErrorTrackingProcess, self).run() 
May 07 17:25:11   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run 
May 07 17:25:11     self._target(*self._args, **self._kwargs) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 7 times.

anjali411 added a commit that referenced this pull request May 7, 2020
ghstack-source-id: 2c7f9ea
Pull Request resolved: #37959
@anjali411 anjali411 requested review from ezyang and zasdfgbnm May 7, 2020 16:06
AT_DISPATCH_ALL_TYPES_AND2(at::ScalarType::Half, at::ScalarType::Bool, in_tensor.scalar_type(), "roll_cuda", [&] {
AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND3(at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16,
in_tensor.scalar_type(), "roll_cuda", [&] {
using value_t = typename ztype<scalar_t>::value_t;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a value_t here? CPU's ztype<scalar_t>::value_t is a noop for c10::complex

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah forgot to remove it after replacing AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 with AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND3

AT_DISPATCH_ALL_TYPES_AND(ScalarType::Bool, iter.dtype(), "sum_cuda", [&]() {
sum_kernel_impl<scalar_t>(iter);
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND(ScalarType::Bool, iter.dtype(), "sum_cuda", [&]() {
using value_t = typename ztype<scalar_t>::value_t;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND and remove the ztype will just work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm there was an issue with __shfl_up_sync for c10::complex. I'll look into it more

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ::thrust_t

auto total_dims = in_tensor.dim();

AT_DISPATCH_ALL_TYPES_AND2(at::ScalarType::Half, at::ScalarType::Bool, in_tensor.scalar_type(), "roll_cuda", [&] {
AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND3(at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will conflict with #37977, whichever lands first, the other needs change.

@zasdfgbnm
Copy link
Copy Markdown
Collaborator

Test failure looks real.

@ezyang ezyang changed the title sum and roll on cuda ComplexFloat sum and roll on cuda May 8, 2020
@anjali411 anjali411 changed the title ComplexFloat sum and roll on cuda sum and roll on cuda for complex dtypes May 12, 2020
@anjali411
Copy link
Copy Markdown
Contributor Author

roll and sum are supported on CUDA for complex tensors now (this was added in a different PR)

@anjali411 anjali411 closed this Jan 15, 2021
@facebook-github-bot facebook-github-bot deleted the gh/anjali411/15/head branch February 15, 2021 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants