Skip to content

Vectorize copysign on CPU#51792

Closed
peterbell10 wants to merge 28 commits intogh/peterbell10/49/basefrom
gh/peterbell10/49/head
Closed

Vectorize copysign on CPU#51792
peterbell10 wants to merge 28 commits intogh/peterbell10/49/basefrom
gh/peterbell10/49/head

Conversation

@peterbell10
Copy link
Copy Markdown
Collaborator

@peterbell10 peterbell10 commented Feb 5, 2021

Stack from ghstack:

Differential Revision: D27769007

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Feb 5, 2021
ghstack-source-id: 4089993
Pull Request resolved: pytorch#51792
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Feb 5, 2021

💊 CI failures summary and remediations

As of commit cc2a5eb (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Apr 13 17:52:49 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 13 17:52:49 
Apr 13 17:52:49 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 13 17:52:49 Traceback of TorchScript (most recent call last):
Apr 13 17:52:49   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 338, in raise_func_script
Apr 13 17:52:49 @torch.jit.script
Apr 13 17:52:49 def raise_func_script(expected_err: str) -> torch.Tensor:
Apr 13 17:52:49     raise ValueError(expected_err)
Apr 13 17:52:49     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Apr 13 17:52:49 RuntimeError: Expected error
Apr 13 17:52:49 
Apr 13 17:52:49 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 13 17:52:49 Traceback of TorchScript (most recent call last):
Apr 13 17:52:49   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 338, in raise_func_script
Apr 13 17:52:49 @torch.jit.script
Apr 13 17:52:49 def raise_func_script(expected_err: str) -> torch.Tensor:
Apr 13 17:52:49     raise ValueError(expected_err)
Apr 13 17:52:49     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Apr 13 17:52:49 RuntimeError: Expected error
Apr 13 17:52:49 
Apr 13 17:52:50 ok (2.708s)
Apr 13 17:52:52   test_wait_all_multiple_call (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:110] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Apr 13 16:37:09 RuntimeError: Process 0 terminated or timed out after 100.07356643676758 seconds
Apr 13 16:37:09 ======================================================================
Apr 13 16:37:09 ERROR [100.127s]: test_py_tensors_multi_async_call (__main__.TensorPipeRpcTestWithSpawn)
Apr 13 16:37:09 ----------------------------------------------------------------------
Apr 13 16:37:09 Traceback (most recent call last):
Apr 13 16:37:09   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 322, in wrapper
Apr 13 16:37:09     self._join_processes(fn)
Apr 13 16:37:09   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 515, in _join_processes
Apr 13 16:37:09     self._check_return_codes(elapsed_time)
Apr 13 16:37:09   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 563, in _check_return_codes
Apr 13 16:37:09     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
Apr 13 16:37:09 RuntimeError: Process 0 terminated or timed out after 100.07356643676758 seconds
Apr 13 16:37:09 
Apr 13 16:37:09 ----------------------------------------------------------------------
Apr 13 16:37:09 Ran 356 tests in 1269.955s
Apr 13 16:37:09 
Apr 13 16:37:09 FAILED (errors=1, skipped=6)
Apr 13 16:37:09 
Apr 13 16:37:09 Generating XML reports...
Apr 13 16:37:09 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDdpComparisonTestWithSpawn-20210413161559.xml
Apr 13 16:37:09 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDdpUnderDistAutogradTestWithSpawn-20210413161559.xml
Apr 13 16:37:09 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDistAutogradTestWithSpawn-20210413161559.xml

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Feb 8, 2021
ghstack-source-id: 76b0813
Pull Request resolved: pytorch#51792
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Feb 9, 2021
ghstack-source-id: bef766e
Pull Request resolved: pytorch#51792
Comment thread aten/src/ATen/native/cpu/BinaryOpsKernel.cpp Outdated
@peterbell10 peterbell10 requested a review from mruberry April 6, 2021 17:47
Comment thread aten/src/ATen/cpu/vec256/vec256_base.h Outdated
store(tmp);
exp.store(tmp_exp);
for (int64_t i = 0; i < size(); i++) {
for (int i = 0; i < size(); i++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type i like size():

auto i = decltype(size()){0}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like decltype(size()) so added a using size_type to Vec256 that I can use instead.

Comment thread aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp
Comment thread aten/src/ATen/native/quantized/fake_quant_per_tensor_affine.cpp Outdated
Comment thread c10/util/Half-math.h Outdated
namespace std {

/// Used by vec256<c10::Half>::map
inline c10::Half acos(c10::Half a) { return std::acos(float(a));}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the history of half math is. @ngimel, do you know?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't. I once wondered how math overloads work, but didn't get a full picture, I wonder why it's necessary, shouldn't there be an implicit Half->float conversion (that's defined)? I thought that's how std math functions work in cuda kernels.

Copy link
Copy Markdown
Collaborator Author

@peterbell10 peterbell10 Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed BFloat-math.h was an accepted pattern, but if that's not true then I've just removed the Half-math.h file and focused on copysign.

@peterbell10 peterbell10 requested a review from mruberry April 8, 2021 23:27
Copy link
Copy Markdown
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase against ToT to pick up fix for #55613
Also, do not introduce any new std::min,std::max overrides for types that have default constructor, otherwise it can lead to unexpected sideeffects.

Comment thread aten/src/ATen/native/cpu/BinaryOpsKernel.cpp Outdated
std::is_floating_point<T>::value ||
std::is_same<T, at::Half>::value> {
std::is_same<T, at::Half>::value ||
std::is_same<T, at::BFloat16>::value> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be put into separate PR (as it have nothing to do with vectorization goal)

});
cpu_kernel_vec(iter,
[](scalar_t a, scalar_t b) -> scalar_t {
return std::copysign(a, b);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we can not use std::copysign otherwise it will cause an ICE

Comment thread aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp
Comment thread aten/src/ATen/native/quantized/fake_quant_per_tensor_affine.cpp Outdated
Comment thread c10/util/Half-math.h Outdated
@peterbell10
Copy link
Copy Markdown
Collaborator Author

PTAL @malfet, @mruberry

Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me but not an expert on copysign or c10/util. @malfet and @ezyang, would you two like to take a look?

@ezyang ezyang self-requested a review April 14, 2021 15:40
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 14, 2021

Actually, going to wait on @malfet, as I don't quite understand the ICE concerns.

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Apr 14, 2021

Looks good to me, thank you for the updates

@mruberry
Copy link
Copy Markdown
Collaborator

Thanks for taking a look, @ezyang and @malfet!

@mruberry mruberry self-requested a review April 14, 2021 17:38
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in 8c74e1b.

@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/49/head branch April 21, 2021 14:15
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary: Pull Request resolved: pytorch#51792

Test Plan: Imported from OSS

Reviewed By: agolynski

Differential Revision: D27769007

Pulled By: mruberry

fbshipit-source-id: 65fceb9f59ed6afee4452278992340da104ed5fe
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary: Pull Request resolved: pytorch#51792

Test Plan: Imported from OSS

Reviewed By: agolynski

Differential Revision: D27769007

Pulled By: mruberry

fbshipit-source-id: 65fceb9f59ed6afee4452278992340da104ed5fe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants