Port THCS to ATen. by ezyang · Pull Request #8689 · pytorch/pytorch

ezyang · 2018-06-20T15:25:57Z

General structure of the sparse implementation:

SparseCUDATensor.{cpp, cu} and SparseCUDATensorMath.cu contain
the same functions as their CPU analogues
SparseCUDAApplyUtils.cuh contains what used to be in
THCSTensor.cu
SparseCUDABlas.cu contains what used to be THCSparse.cu

Unrelated improvements:

Forward declared CUDA types in Context.h are now moved
exclusively to CUDAHooks
New getCurrentCUDASparseHandle in Context
Support for printing CUSPARSE_STATUS_ZERO_PIVOT error message
directly

Some unusual pieces:

get_device got the LegacyBridge makeover, as it needs special
logic on sparse tensors (defer to the inner tensors).
I noticed that I need to turn off device_guard codegen
for many functions in sparse, noticed because get_device
became a native function, and resulted in an infinite recursion. This was
done by adding device_guard: False to the native definitions. An alternative
strategy might be to make the heuristic for deciding when to put in a device
guard more clever.

Scaffolding removal:

LegacyBridge now special-cases only on sparse versus dense;
no more CUDA test (hooray!)
Native bindings get CUDA/SparseCUDA dispatch entries.

CPU sparse refactoring:

New SparseUtils.h header, with all of the utility functions that
used to live in SparseTensor.cpp
new_with_tensor_sparse now correctly handles both CPU and CUDA
transpose functions in sparse/ turned out to be dead, so I killed them

Bugs I noticed while working on this:

I used accessor<...>() on a CUDA tensor, because I thought it does
the CUDA-CPU sync. It does not.

TODO:

For sparse only methods, we can now remove the TH binding
entirely

Signed-off-by: Edward Z. Yang ezyang@fb.com

ezyang · 2018-06-20T22:17:08Z

@pytorchbot retest this please

aten/src/ATen/native/LegacyBridge.cpp

-    } else {
-      return th_add(self, other, alpha);
-    }
+  // See Note [CPU sparse is globally native] and Note [Multiple dispatch to sparse]


aten/src/ATen/native/LegacyBridge.cpp

-    } else {
-      return th_add_(self, other, alpha);
-    }
+  // See Note [CPU sparse is globally native] and Note [Multiple dispatch to sparse]


aten/src/ATen/native/LegacyBridge.cpp

-    } else {
-      return th_addmm_out(result, self, mat1, mat2, beta, alpha);
-    }
+  // See Note [CPU sparse is globally native] and Note [Multiple dispatch to sparse]


aten/src/ATen/native/LegacyBridge.cpp

-    } else {
-      return th_addmm(self, mat1, mat2, beta, alpha);
-    }
+  // See Note [CPU sparse is globally native] and Note [Multiple dispatch to sparse]


aten/src/ATen/native/native_functions.yaml

+  device_guard: False
+
+
+- func: native_get_device(Tensor self) -> int64_t


aten/src/ATen/native/sparse/SparseUtils.h

+
+// TODO: Expose this for real in ATen, some day?
+// NB: Doesn't preserve data.
+inline Tensor _new_values_with_size_of(const Tensor& values, int64_t nnz) {


aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cpp

+      ", mask is on device ", mask.get_device(), ", out is on device ", r.get_device());
+  resize_as_sparse_(r, mask);
+  if (mask._nnz() == 0) {
+    r.zero_();


aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cpp

+  _get_sparse_impl(r)->set_coalesced(mask.is_coalesced());
+  _get_sparse_impl(r)->set_nnz(mask._nnz());
+
+  LongTensor indices = at::zeros({mask._nnz()}, mask_indices.type());


aten/src/THCS/THCSparse.cu

-    if(n == 1)
-      *ldb = k;
-  }
+  THError("Internal error! This API is deprecated. Shout if you need it.");


aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh

+    TensorInfo<indexT, IndexType> indices,
+    TensorInfo<Real, IndexType> values,
+    const IndexType nnz) {
+  IndexType indskip = indices.strides[0];


General structure of the sparse implementation: - SparseCUDATensor.{cpp, cu} and SparseCUDATensorMath.cu contain the same functions as their CPU analogues - SparseCUDAApplyUtils.cuh contains what used to be in THCSTensor.cu - SparseCUDABlas.cu contains what used to be THCSparse.cu Unrelated improvements: - Forward declared CUDA types in Context.h are now moved exclusively to CUDAHooks - New getCurrentCUDASparseHandle in Context - Support for printing CUSPARSE_STATUS_ZERO_PIVOT error message directly Some unusual pieces: - get_device got the LegacyBridge makeover, as it needs special logic on sparse tensors (defer to the inner tensors). - I noticed that I need to turn off device_guard codegen for many functions in sparse, noticed because get_device became a native function, and resulted in an infinite recursion. This was done by adding device_guard: False to the native definitions. An alternative strategy might be to make the heuristic for deciding when to put in a device guard more clever. Scaffolding removal: - LegacyBridge now special-cases only on sparse versus dense; no more CUDA test (hooray!) - Native bindings get CUDA/SparseCUDA dispatch entries. CPU sparse refactoring: - New SparseUtils.h header, with all of the utility functions that used to live in SparseTensor.cpp - new_with_tensor_sparse now correctly handles both CPU and CUDA - transpose functions in sparse/ turned out to be dead, so I killed them Bugs I noticed while working on this: - I used accessor<...>() on a CUDA tensor, because I thought it does the CUDA-CPU sync. It does not. TODO: - For sparse only methods, we can now remove the TH binding entirely Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

This reverts commit 694493f.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

This reverts commit 0552b8f.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

gchanan

hooray! Some minor comments / nits, but this looks ready to go once those are addressed.

aten/src/ATen/Declarations.cwrap

aten/src/ATen/SparseTensorImpl.cpp

aten/src/ATen/SparseTensorImpl.h

aten/src/ATen/copy_wrapper.py

aten/src/ATen/native/sparse/SparseUtils.h

+// NB: Doesn't preserve data.
+inline Tensor _new_values_with_size_of(const Tensor& values, int64_t nnz) {
+  if (values.numel() == 0) { // values tensor uninitialized
+    // TODO: This logic looks bogus; if we have an uninitialized


aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu

+  // TODO: This error message seems awfully opaque
+  AT_CHECK(sparse_._sparseDims() == 2, "matrices expected, got ", sparse_._sparseDims(), "D tensor");
+  AT_CHECK(sparse_._denseDims() == 0, "scalar values expected, got ", sparse_._denseDims(), "D values");
+  AT_CHECK(dense.dim() == 2, "matrices expected, got ", dense.dim(), "D tensor");


aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu

+      "Argument #2: matrices expected, got ", sparse_._sparseDims(), "D tensor");
+  AT_CHECK(sparse_._denseDims() == 0,
+      "Argument #2: scalar values expected, got ", sparse_._denseDims(), "D values");
+  AT_CHECK(dense.dim() == 2,


aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu

+SparseTensor& s_mul_out_sparse_cuda(SparseTensor& r_, const SparseTensor& t_, const SparseTensor& src_) {
+#ifndef __HIP_PLATFORM_HCC__
+  AT_CHECK(_check_device({r_, t_, src_}));
+  AT_CHECK(t_.sizes().equals(src_.sizes()), "mul operands have incompatible sizes");


Signed-off-by: Edward Z. Yang <ezyang@fb.com>

…elds in env when they are dead. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Summary: **Summary**: This PR is a followup of mruberry's #9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR #10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check #8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - #6786 - #5475 - #9401 - #8689 - #8919 Pull Request resolved: #10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a

Summary: **Summary**: This PR is a followup of mruberry's pytorch/pytorch#9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR pytorch/pytorch#10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check pytorch/pytorch#8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - pytorch/pytorch#6786 - pytorch/pytorch#5475 - pytorch/pytorch#9401 - pytorch/pytorch#8689 - pytorch/pytorch#8919 Pull Request resolved: pytorch/pytorch#10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a

…rch#10301) Summary: **Summary**: This PR is a followup of mruberry's pytorch#9318. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR pytorch#10147) **Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check pytorch#8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: ROCm/hip#374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - pytorch#6786 - pytorch#5475 - pytorch#9401 - pytorch#8689 - pytorch#8919 Pull Request resolved: pytorch#10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a

ezyang requested a review from gchanan June 20, 2018 15:25

ezyang requested review from apaszke, colesbury, soumith and zdevito as code owners June 20, 2018 15:25

gchanan reviewed Jun 20, 2018

View reviewed changes

ezyang force-pushed the pr/thcs-to-aten branch 2 times, most recently from 2d1d33b to d9cc44a Compare June 21, 2018 13:24

ezyang mentioned this pull request Jun 21, 2018

[TESTING] Port THCS to ATen, with TH expunged #8744

Closed

ezyang force-pushed the pr/thcs-to-aten branch 2 times, most recently from 319903b to 5d61466 Compare June 22, 2018 13:09

ezyang added 18 commits June 22, 2018 08:49

Kill old code.

4759da7

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Kill unnecessary trampolines.

acf88d8

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix incorrect symbol

5df2cb4

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Add missing include on cuda8 apparently

ec5cc8f

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix undefined reference error.

00b1712

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

More adequately describe sparse_coo_tensor situation

3ba9de5

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix undefined ref differently

b3fd462

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix one more

29d9452

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Some ifdefs for HIP

8d81dcd

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Grab one more location

2160dfa

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Unconditionally set TH_INDEX_BASE to 0.

ce99514

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Revert "Some ifdefs for HIP"

f7daf04

This reverts commit 694493f.

hack around the sparse_coo_tensor not defined problem

26dc53c

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Revert "Revert "Some ifdefs for HIP""

ff342af

This reverts commit 0552b8f.

CR

7e9008e

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CR

635ff9f

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CR

2e0e8ac

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang added 10 commits June 22, 2018 08:49

CR

0fa4330

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

type to options

1d230f0

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Kill THS and THCS and their derived types.

b5ade36

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Last mile of fixes

2be9e2b

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete dead disabled_features.yaml rules

25b65b5

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

flake8

cdcbe51

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

More information on the sparse tensor invariants.

6df98cd

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Track Half changes

c2a4787

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

acctype fix

94bc75c

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

delete dead header

c13b092

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang force-pushed the pr/thcs-to-aten branch from d326be0 to c13b092 Compare June 22, 2018 15:49

Kill more dead comments

c7d4a43

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

gchanan approved these changes Jun 22, 2018

View reviewed changes

ezyang added 5 commits June 22, 2018 12:56

th_get_device -> _th_get_device

77d7bb2

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Remove now redundant comments

d84f43e

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

comment clarify

c3b9878

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

More notes about checked_cast_tensor, and eliminate Storage/Tensor fi…

8a15789

…elds in env when they are dead. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

dead todo

1a3c38e

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang merged commit 3598356 into pytorch:master Jun 24, 2018

syed-ahmed mentioned this pull request Aug 7, 2018

Refactor THCNumerics and add common math functions for at::Half #10301

Closed

		device_guard: False


		- func: native_get_device(Tensor self) -> int64_t

Conversation

ezyang commented Jun 20, 2018

Uh oh!

ezyang commented Jun 20, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

gchanan left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!