Add 64-bit indexing support to THC index reductions by peterbell10 · Pull Request #33405 · pytorch/pytorch

peterbell10 · 2020-02-16T20:07:15Z

Fixes #32863, (together with #33310 for the TensorIterator reductions)

This adds 64-bit indexed kernels for THC_reduceDimIndex and uses THCTensor_canUse32BitIndexMath to switch between the two at runtime.

I have a test for this locally but haven't included it here because max is much slower than argmax. To the point where the test takes several minutes to call max on just one 2**32 element tensor. That seems excessive, even for a slow test but I can push it if preferred.

dr-ci · 2020-02-19T14:07:19Z

💊 CircleCI build failures summary and remediations

As of commit e1b6094:

None of the build failures appear to be your fault.

1/1 broken upstream at merge base 8b6a898 since Feb 18
Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:
```
git fetch origin viable/strict
git rebase viable/strict
```
Check out the recency history of this "viable master" tracking branch.

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🚧 1 upstream failure recognized by patterns:

These builds matched patterns, but were probably caused by upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_build from Feb 18 until Feb 19

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 5 times.

peterbell10 · 2020-02-19T15:04:11Z

Rebased on #33376 to fix conflicts. Unfortunately, that's ahead of upstream/viable/strict so the tests will fail for the time being.

ezyang · 2020-02-19T16:32:37Z

The test won't trigger on CircleCI, but it would be good to add the ">4G CUDA tensor" test that would actually trigger this.

ezyang

Great, waiting on the test

ezyang · 2020-02-20T15:07:58Z

This seems to irrecoverably fail the XLA build:

Feb 19 22:14:08 ======================================================================
Feb 19 22:14:08 ERROR [4.475s]: test_minmax_large_axis_xla (__main__.TestTorchDeviceTypeXLA)
Feb 19 22:14:08 ----------------------------------------------------------------------
Feb 19 22:14:08 Traceback (most recent call last):
Feb 19 22:14:08   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 181, in instantiated_test
Feb 19 22:14:08     return test(self, device_arg)
Feb 19 22:14:08   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 382, in wrapper
Feb 19 22:14:08     fn(*args, **kwargs)
Feb 19 22:14:08   File "/var/lib/jenkins/workspace/xla/test/../../test/test_torch.py", line 8526, in test_minmax_large_axis
Feb 19 22:14:08     self.assertEqual(idx, x.shape[0] - 1)
Feb 19 22:14:08   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 336, in assertEqual
Feb 19 22:14:08     return DeviceTypeTestBase.assertEqual(self, x, y, prec, message, allow_inf)
Feb 19 22:14:08   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 784, in assertEqual
Feb 19 22:14:08     self.assertEqual(x.item(), y, prec=prec, message=message,
Feb 19 22:14:08 RuntimeError: Resource exhausted: From /job:localservice/replica:0/task:0:
Feb 19 22:14:08 Failed to allocate request for 33.00GiB (35433480192B) on device ordinal 0
Feb 19 22:14:08 	 [[{{node XRTExecute}}]]
Feb 19 22:14:08 Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Feb 19 22:14:08

We should follow previous practice for large allocation and use the @unittest.skipIf(not TEST_LARGE_TENSOR, "not enough memory") decorator to turn this off when large memory tests are not requested.

peterbell10 · 2020-02-20T16:40:26Z

@ezyang I can only see TEST_LARGE_TENSOR defined for test_cuda.py. Also, I can't see how a global variable would work here since the available memory depends on the device.

Would it make sense to report the apparent memory leak to XLA and just add a test skip to XLA for the time being?

ezyang · 2020-02-20T19:21:33Z

Well, you can still do something similar to what the test does in the end, which is something like torch.cuda.get_device_properties(0).total_memory >= 12e9. It kind of makes sense that one size fits all as the majority of all big tensor bugs have to do with 32-bit indexing. And yes, I'd make this test run only on CUDA.

peterbell10 · 2020-02-20T20:23:27Z

Okay, moved the test to test_cuda.py and am now guarding it with TEST_LARGE_TENSOR.

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-02-21T03:21:51Z

@ezyang merged this pull request in c882425.

Summary: Fixes pytorch#32863, (together with pytorch#33310 for the `TensorIterator` reductions) This adds 64-bit indexed kernels for `THC_reduceDimIndex` and uses `THCTensor_canUse32BitIndexMath` to switch between the two at runtime. I have a test for this locally but haven't included it here because `max` is much slower than `argmax`. To the point where the test takes several minutes to call max on just one `2**32` element tensor. That seems excessive, even for a slow test but I can push it if preferred. Pull Request resolved: pytorch#33405 Differential Revision: D20010769 Pulled By: ezyang fbshipit-source-id: a8a86f662598d5fade4d90448436418422c699a3

peterbell10 added the open source label Feb 16, 2020

peterbell10 requested a review from ezyang February 16, 2020 20:07

peterbell10 force-pushed the thc-index-reduce branch from db08b5e to 66ede08 Compare February 19, 2020 13:44

Add 64-bit support to THC index reductions

b47e5bb

peterbell10 force-pushed the thc-index-reduce branch from 66ede08 to b47e5bb Compare February 19, 2020 13:49

ezyang approved these changes Feb 19, 2020

View reviewed changes

Add regression test

e1b6094

peterbell10 force-pushed the thc-index-reduce branch from bf0848b to e1b6094 Compare February 20, 2020 20:21

ezyang approved these changes Feb 20, 2020

View reviewed changes

facebook-github-bot reviewed Feb 20, 2020

View reviewed changes

facebook-github-bot closed this in c882425 Feb 20, 2020

facebook-github-bot added the merged label Feb 21, 2020

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 64-bit indexing support to THC index reductions#33405

Add 64-bit indexing support to THC index reductions#33405
peterbell10 wants to merge 2 commits intopytorch:masterfrom
peterbell10:thc-index-reduce

peterbell10 commented Feb 16, 2020

Uh oh!

dr-ci bot commented Feb 19, 2020 •

edited

Loading

Uh oh!

peterbell10 commented Feb 19, 2020

Uh oh!

ezyang commented Feb 19, 2020

Uh oh!

ezyang left a comment

Uh oh!

ezyang commented Feb 20, 2020

Uh oh!

peterbell10 commented Feb 20, 2020

Uh oh!

ezyang commented Feb 20, 2020

Uh oh!

peterbell10 commented Feb 20, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Feb 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

peterbell10 commented Feb 16, 2020

Uh oh!

dr-ci bot commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Detailed failure analysis

🚧 1 upstream failure recognized by patterns:

Uh oh!

peterbell10 commented Feb 19, 2020

Uh oh!

ezyang commented Feb 19, 2020

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Feb 20, 2020

Uh oh!

peterbell10 commented Feb 20, 2020

Uh oh!

ezyang commented Feb 20, 2020

Uh oh!

peterbell10 commented Feb 20, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dr-ci bot commented Feb 19, 2020 •

edited

Loading