Add division overload with rounding_mode selection (#50280) by mruberry · Pull Request #51706 · pytorch/pytorch

mruberry · 2021-02-04T05:42:14Z

Summary:
Pull Request resolved: #50280

As mentioned in gh-43874, this adds a rounding_mode={'true', 'trunc', 'floor'}
argument so torch.div can be used as a replacement for floor_divide during
the transitional period.

I've included dedicated kernels for truncated and floor division which
aren't strictly necessary for float, but do perform significantly better (~2x) than
doing true division followed by a separate rounding kernel.

Note: I introduce new overloads for aten::div instead of just adding a default
rounding_mode because various JIT passes rely on the exact operator schema.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26123271

Pulled By: mruberry

Summary: Pull Request resolved: pytorch#50280 As mentioned in pytorchgh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 66a9db1b631001c2ae9e2c4f8dc91edea1de364a

facebook-github-bot · 2021-02-04T05:42:21Z

💊 CI failures summary and remediations

As of commit d4c43de (more details on the Dr. CI page):

3/4 failures possibly* introduced in this PR
- 1/3 non-CircleCI failure(s)
1/4 broken upstream at merge base c41678f on Feb 03 from 5:06pm to 11:46pm

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_xla_linux_bionic_py3_6_clang9_build (1/2)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Feb 04 07:42:27 torch_xla/csrc/aten_xla_type_default.cpp:1380:22: error: no matching function for call to 'div'

Feb 04 07:42:04 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/device.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/device.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
Feb 04 07:42:05 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/init_python_bindings.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/init_python_bindings.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
Feb 04 07:42:10 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/debug_util.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/debug_util.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
Feb 04 07:42:12 In file included from torch_xla/csrc/python_util.cpp:6:
Feb 04 07:42:12 /var/lib/jenkins/workspace/torch/csrc/utils/python_strings.h:78:19: warning: unused function 'PyObject_FastGetAttrString' [-Wunused-function]
Feb 04 07:42:12 static py::object PyObject_FastGetAttrString(PyObject *obj, char *name)
Feb 04 07:42:12                   ^
Feb 04 07:42:17 1 warning generated.
Feb 04 07:42:17 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/aten_xla_type_default.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/aten_xla_type_default.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
Feb 04 07:42:23 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/batch_norm.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/batch_norm.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
Feb 04 07:42:27 torch_xla/csrc/aten_xla_type_default.cpp:1380:22: error: no matching function for call to 'div'
Feb 04 07:42:27   auto div_out_tmp = AtenXlaType::div(self, other, rounding_mode);
Feb 04 07:42:27                      ^~~~~~~~~~~~~~~~
Feb 04 07:42:27 /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.h:307:21: note: candidate function not viable: requires 2 arguments, but 3 were provided
Feb 04 07:42:27   static at::Tensor div(const at::Tensor& self, const at::Tensor& other);
Feb 04 07:42:27                     ^
Feb 04 07:42:27 /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.h:309:21: note: candidate function not viable: requires 2 arguments, but 3 were provided
Feb 04 07:42:27   static at::Tensor div(const at::Tensor& self, at::Scalar other);
Feb 04 07:42:27                     ^
Feb 04 07:42:31 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/layout_manager.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/layout_manager.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
Feb 04 07:42:38 1 error generated.

pytorch_macos_10_13_py3_test (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Feb 04 08:38:24 RuntimeError: test_serialization failed! Received signal: SIGKILL

Feb 04 08:37:39   test_serialization_filelike_stress (__main__.TestOldSerialization) ... ok (0.448s)
Feb 04 08:37:39   test_serialization_filelike_uses_readinto (__main__.TestOldSerialization) ... ok (0.002s)
Feb 04 08:37:39   test_serialization_gzip (__main__.TestOldSerialization) ... ok (0.007s)
Feb 04 08:37:39   test_serialization_map_location (__main__.TestOldSerialization) ... ok (0.048s)
Feb 04 08:38:07   test_serialization_offset (__main__.TestOldSerialization) ... ok (28.351s)
Feb 04 08:38:24   test_serialization_offset_filelike (__main__.TestOldSerialization) ... Traceback (most recent call last):
Feb 04 08:38:24   File "test/run_test.py", line 925, in <module>
Feb 04 08:38:24     main()
Feb 04 08:38:24   File "test/run_test.py", line 904, in main
Feb 04 08:38:24     raise RuntimeError(err_message)
Feb 04 08:38:24 RuntimeError: test_serialization failed! Received signal: SIGKILL
Feb 04 08:38:25 + cleanup
Feb 04 08:38:25 + retcode=1
Feb 04 08:38:25 + set +x


Exited with code exit status 1

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_linux_backward_compatibility_check_test on Feb 03 from 5:06pm to 11:46pm (443a431 - 1518aee)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

facebook-github-bot · 2021-02-04T05:42:34Z

This pull request was exported from Phabricator. Differential Revision: D26123271

mruberry · 2021-02-04T05:43:28Z

Testing only.

facebook-github-bot · 2021-02-04T21:12:26Z

@mruberry merged this pull request in b150f15.

… / 8 for CUDA (#51834) Summary: It seems that the std::copysign code introduced in #51706 is too much for gcc 7.5 / 8 when compiled on arm64 (e.g. on Jetson with latest Jetpack) and causes it to produce an internal compiler error with segfault during compilation. This avoids the compiler bug it by not using std::copysign. A very kind person sent a Jetson Xavier NX {emoji:1f381} thank you {emoji:2764}. After #51900 fixed this for CPU-only arm64 (eg Raspberry), this fixes it for CUDA-using arm64 (e.g. Jetson). CUDA device lambdas must also be present as host functions for technical reasons but they are never used, so we just assert in the CPU variant instead of actually doing the operation. Pull Request resolved: #51834 Reviewed By: mrshenli Differential Revision: D27622277 Pulled By: malfet fbshipit-source-id: a1dc4c3a67f925019782e24b796919e17339749f

… / 8 for CUDA (pytorch#51834) Summary: It seems that the std::copysign code introduced in pytorch#51706 is too much for gcc 7.5 / 8 when compiled on arm64 (e.g. on Jetson with latest Jetpack) and causes it to produce an internal compiler error with segfault during compilation. This avoids the compiler bug it by not using std::copysign. A very kind person sent a Jetson Xavier NX {emoji:1f381} thank you {emoji:2764}. After pytorch#51900 fixed this for CPU-only arm64 (eg Raspberry), this fixes it for CUDA-using arm64 (e.g. Jetson). CUDA device lambdas must also be present as host functions for technical reasons but they are never used, so we just assert in the CPU variant instead of actually doing the operation. Pull Request resolved: pytorch#51834 Reviewed By: mrshenli Differential Revision: D27622277 Pulled By: malfet fbshipit-source-id: a1dc4c3a67f925019782e24b796919e17339749f

Summary: Pull Request resolved: pytorch#51706 Pull Request resolved: pytorch#50280 As mentioned in pytorchgh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46

… / 8 for CUDA (pytorch#51834) Summary: It seems that the std::copysign code introduced in pytorch#51706 is too much for gcc 7.5 / 8 when compiled on arm64 (e.g. on Jetson with latest Jetpack) and causes it to produce an internal compiler error with segfault during compilation. This avoids the compiler bug it by not using std::copysign. A very kind person sent a Jetson Xavier NX {emoji:1f381} thank you {emoji:2764}. After pytorch#51900 fixed this for CPU-only arm64 (eg Raspberry), this fixes it for CUDA-using arm64 (e.g. Jetson). CUDA device lambdas must also be present as host functions for technical reasons but they are never used, so we just assert in the CPU variant instead of actually doing the operation. Pull Request resolved: pytorch#51834 Reviewed By: mrshenli Differential Revision: D27622277 Pulled By: malfet fbshipit-source-id: a1dc4c3a67f925019782e24b796919e17339749f

mruberry requested a review from albanD as a code owner February 4, 2021 05:42

facebook-github-bot added the fb-exported label Feb 4, 2021

facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Feb 4, 2021

facebook-github-bot closed this in b150f15 Feb 4, 2021

facebook-github-bot added the Merged label Feb 4, 2021

t-vi mentioned this pull request Feb 6, 2021

avoid CPU std::copysign segfault when compiling on arm64 with gcc 7.5 / 8 for CUDA #51834

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add division overload with rounding_mode selection (#50280)#51706

Add division overload with rounding_mode selection (#50280)#51706
mruberry wants to merge 1 commit intopytorch:masterfrom
mruberry:export-D26123271

mruberry commented Feb 4, 2021

Uh oh!

facebook-github-bot commented Feb 4, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 4, 2021

Uh oh!

mruberry commented Feb 4, 2021

Uh oh!

facebook-github-bot commented Feb 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mruberry commented Feb 4, 2021

Uh oh!

facebook-github-bot commented Feb 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_xla_linux_bionic_py3_6_clang9_build (1/2)

pytorch_macos_10_13_py3_test (2/2)

🚧 1 fixed upstream failure:

Uh oh!

facebook-github-bot commented Feb 4, 2021

Uh oh!

mruberry commented Feb 4, 2021

Uh oh!

facebook-github-bot commented Feb 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

facebook-github-bot commented Feb 4, 2021 •

edited

Loading