Skip to content

Unify checks for normal#69629

Closed
nkaretnikov wants to merge 1 commit intogh/nkaretnikov/2/basefrom
gh/nkaretnikov/2/head
Closed

Unify checks for normal#69629
nkaretnikov wants to merge 1 commit intogh/nkaretnikov/2/basefrom
gh/nkaretnikov/2/head

Conversation

@nkaretnikov
Copy link
Collaborator

@nkaretnikov nkaretnikov commented Dec 8, 2021

Stack from ghstack:

  • Split up checks into multiple macros
  • Use them in meta and impl functions for now since either might be called
  • Get rid of what appear to be just legacy checks that are irrelevant now
  • Update docs to not mention deprecated behavior.

Backward compatibility issues:

  • Shapes must match exactly between mean and std (no automatic resizing):

Old:

import torch

mean = torch.rand(3, 4, 5, device='cpu')
std = torch.rand(3, 4, 1, device='cpu')

torch.normal(mean=mean, std=std).size()
torch.Size([3, 4, 5])

New:

import torch

mean = torch.rand(3, 4, 5, device='cpu')
std = torch.rand(3, 4, 1, device='cpu')

torch.normal(mean=mean, std=std)
Traceback (most recent call last):
File "", line 1, in
RuntimeError: inconsistent tensor, output size ([3, 4, 5]) is not the same as
input size ([3, 4, 1])

  • Tensors with the same number of elements are treated as different if their
    shapes don't match exactly:

Old:

import torch

out = torch.rand(2, 3, device='cpu')
mean = torch.rand(2, 3, device='cpu')
std = torch.rand(1, 6, device='cpu')

torch.normal(out=out, mean=mean, std=std)
:1: UserWarning: std and mean have the same number of elements, but are
not broadcastable. This was previously a supported mode of operation, but is
now deprecated and the support will be removed in version 1.6 release. Note
that the current implementation reshapes std to the shape of mean, which may be
incur data copies. Please ensure that std and mean are broadcastable to avoid
these issues. (Triggered internally at
aten/src/ATen/native/DistributionTemplates.h:189.)
tensor([[ 0.3345, 0.6314, 1.0163],
[-0.0397, 0.8128, 1.1436]])

New:

import torch

out = torch.rand(2, 3, device='cpu')
mean = torch.rand(2, 3, device='cpu')
std = torch.rand(1, 6, device='cpu')

torch.normal(out=out, mean=mean, std=std)
Traceback (most recent call last):
File "", line 1, in
RuntimeError: The size of tensor a (3) must match the size of tensor b (6) at
non-singleton dimension 1

  • An empty tensor is returned instead of an error when mean is a float and std
    is an empty tensor:

Old:

import torch

mean = 4.
std = torch.rand(0, device='cpu')

torch.normal(mean=mean, std=std)
Traceback (most recent call last):
File "", line 1, in
RuntimeError: min(): Expected reduction dim to be specified for input.numel()
== 0. Specify the reduction dim with the 'dim' argument.

New:

import torch

mean = 4.
std = torch.rand(0, device='cpu')

torch.normal(mean=mean, std=std)
tensor([])

- Split up checks into multiple macros
- Use them in meta and impl functions for now since either might be called
- Get rid of what appear to be just legacy checks that are irrelevant now
- Update docs to not mention deprecated behavior.

Backward compatibility issues:

- Shapes must match exactly between mean and std (no automatic resizing):

Old:

>>> import torch
>>>
>>> mean = torch.rand(3, 4, 5, device='cpu')
>>> std  = torch.rand(3, 4, 1, device='cpu')
>>>
>>> torch.normal(mean=mean, std=std).size()
torch.Size([3, 4, 5])

New:

>>> import torch
>>>
>>> mean = torch.rand(3, 4, 5, device='cpu')
>>> std  = torch.rand(3, 4, 1, device='cpu')
>>>
>>> torch.normal(mean=mean, std=std)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: inconsistent tensor, output size ([3, 4, 5]) is not the same as
input size ([3, 4, 1])

- Tensors with the same number of elements are treated as different if their
  shapes don't match exactly:

Old:

>>> import torch
>>>
>>> out  = torch.rand(2, 3, device='cpu')
>>> mean = torch.rand(2, 3, device='cpu')
>>> std  = torch.rand(1, 6, device='cpu')
>>>
>>> torch.normal(out=out, mean=mean, std=std)
<stdin>:1: UserWarning: std and mean have the same number of elements, but are
not broadcastable. This was previously a supported mode of operation, but is
now deprecated and the support will be removed in version 1.6 release. Note
that the current implementation reshapes std to the shape of mean, which may be
incur data copies. Please ensure that std and mean are broadcastable to avoid
these issues. (Triggered internally at
aten/src/ATen/native/DistributionTemplates.h:189.)
tensor([[ 0.3345,  0.6314,  1.0163],
        [-0.0397,  0.8128,  1.1436]])

New:

>>> import torch
>>>
>>> out  = torch.rand(2, 3, device='cpu')
>>> mean = torch.rand(2, 3, device='cpu')
>>> std  = torch.rand(1, 6, device='cpu')
>>>
>>> torch.normal(out=out, mean=mean, std=std)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: The size of tensor a (3) must match the size of tensor b (6) at
non-singleton dimension 1

- An empty tensor is returned instead of an error when mean is a float and std
  is an empty tensor:

Old:

>>> import torch
>>>
>>> mean = 4.
>>> std  = torch.rand(0, device='cpu')
>>>
>>> torch.normal(mean=mean, std=std)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: min(): Expected reduction dim to be specified for input.numel()
== 0. Specify the reduction dim with the 'dim' argument.

New:

>>> import torch
>>>
>>> mean = 4.
>>> std  = torch.rand(0, device='cpu')
>>>
>>> torch.normal(mean=mean, std=std)
tensor([])

[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Dec 8, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/fc4c45317fed47b1f6d4851d175661c6bfc92cbe/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 8, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit fc4c453 (more details on the Dr. CI page):


  • 6/6 failures introduced in this PR

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build Lint / quick-checks (1/6)

Step: "Ensure correct trailing newlines" (full log | diagnosis details | 🔁 rerun)

2021-12-08T20:19:09.4501518Z python: can't open..._launches.py': [Errno 2] No such file or directory
2021-12-08T20:19:09.4141204Z ##[group]Run set -eux
2021-12-08T20:19:09.4141656Z �[36;1mset -eux�[0m
2021-12-08T20:19:09.4142362Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2021-12-08T20:19:09.4177411Z shell: /bin/bash -e {0}
2021-12-08T20:19:09.4177782Z env:
2021-12-08T20:19:09.4178293Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:19:09.4179004Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:19:09.4179508Z ##[endgroup]
2021-12-08T20:19:09.4320119Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2021-12-08T20:19:09.4331073Z + python torch/testing/_check_kernel_launches.py
2021-12-08T20:19:09.4501518Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2021-12-08T20:19:09.4562186Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2021-12-08T20:19:09.4563815Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2021-12-08T20:19:09.4594895Z shell: /bin/bash -e {0}
2021-12-08T20:19:09.4595205Z env:
2021-12-08T20:19:09.4595666Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:19:09.4596286Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:19:09.4596745Z ##[endgroup]
2021-12-08T20:19:09.4956073Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2021-12-08T20:19:09.4958050Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2021-12-08T20:19:09.4993478Z shell: /bin/bash -e {0}

See GitHub Actions build linux-bionic-py3.6-clang9 / test (default, 1, 2, linux.2xlarge) (2/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:27:16.9973045Z AssertionError: can only test a child process
2021-12-08T21:27:16.9774110Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:27:16.9774661Z AssertionError: can only test a child process
2021-12-08T21:27:16.9963161Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f53bb4f8438>>
2021-12-08T21:27:16.9964654Z Traceback (most recent call last):
2021-12-08T21:27:16.9965665Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:27:16.9967096Z     self._shutdown_workers()
2021-12-08T21:27:16.9968003Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:27:16.9970802Z     if w.is_alive():
2021-12-08T21:27:16.9971600Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:27:16.9972512Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:27:16.9973045Z AssertionError: can only test a child process
2021-12-08T21:27:17.6309779Z ok (0.667s)
2021-12-08T21:27:17.6330849Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:27:19.2124414Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.579s)
2021-12-08T21:27:19.2210519Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.009s)
2021-12-08T21:27:20.7416553Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.520s)
2021-12-08T21:27:24.0321338Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.290s)
2021-12-08T21:27:24.8488572Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.817s)
2021-12-08T21:27:24.8522303Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:27:24.8550810Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:27:24.8576533Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)

See GitHub Actions build linux-xenial-py3.6-gcc7 / test (default, 2, 2, linux.2xlarge) (3/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:26:32.4776606Z FAIL [0.006s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:26:32.4771158Z     raise rte
2021-12-08T21:26:32.4771912Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:26:32.4772589Z     result = test(self, **param_kwargs)
2021-12-08T21:26:32.4773049Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T21:26:32.4773614Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T21:26:32.4774158Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T21:26:32.4774714Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T21:26:32.4775352Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T21:26:32.4775748Z 
2021-12-08T21:26:32.4776032Z ======================================================================
2021-12-08T21:26:32.4776606Z FAIL [0.006s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:26:32.4777378Z ----------------------------------------------------------------------
2021-12-08T21:26:32.4778012Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T21:26:32.4778327Z 
2021-12-08T21:26:32.4778764Z During handling of the above exception, another exception occurred:
2021-12-08T21:26:32.4779137Z 
2021-12-08T21:26:32.4779452Z Traceback (most recent call last):
2021-12-08T21:26:32.4780280Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:26:32.4780948Z     result = test(self, **param_kwargs)
2021-12-08T21:26:32.4781434Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T21:26:32.4782003Z     torch.normal(input, -1, (10,))

See GitHub Actions build linux-bionic-py3.6-clang9 / test (default, 2, 2, linux.2xlarge) (4/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:35:21.6757155Z FAIL [0.006s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:35:21.6751824Z     raise rte
2021-12-08T21:35:21.6752563Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:35:21.6753197Z     result = test(self, **param_kwargs)
2021-12-08T21:35:21.6753665Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T21:35:21.6754202Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T21:35:21.6754745Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T21:35:21.6755314Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T21:35:21.6755941Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T21:35:21.6756330Z 
2021-12-08T21:35:21.6756604Z ======================================================================
2021-12-08T21:35:21.6757155Z FAIL [0.006s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:35:21.6757922Z ----------------------------------------------------------------------
2021-12-08T21:35:21.6758524Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T21:35:21.6758825Z 
2021-12-08T21:35:21.6759262Z During handling of the above exception, another exception occurred:
2021-12-08T21:35:21.6759622Z 
2021-12-08T21:35:21.6759924Z Traceback (most recent call last):
2021-12-08T21:35:21.6760729Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:35:21.6761366Z     result = test(self, **param_kwargs)
2021-12-08T21:35:21.6761857Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T21:35:21.6762408Z     torch.normal(input, -1, (10,))

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / test (default, 2, 2, linux.2xlarge) (5/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:31:03.8449844Z FAIL [0.006s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:31:03.8444276Z     raise rte
2021-12-08T21:31:03.8445064Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:31:03.8445750Z     result = test(self, **param_kwargs)
2021-12-08T21:31:03.8446213Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T21:31:03.8446785Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T21:31:03.8447333Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T21:31:03.8447892Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T21:31:03.8448554Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T21:31:03.8448943Z 
2021-12-08T21:31:03.8449250Z ======================================================================
2021-12-08T21:31:03.8449844Z FAIL [0.006s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:31:03.8450637Z ----------------------------------------------------------------------
2021-12-08T21:31:03.8451271Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T21:31:03.8451588Z 
2021-12-08T21:31:03.8452028Z During handling of the above exception, another exception occurred:
2021-12-08T21:31:03.8452409Z 
2021-12-08T21:31:03.8452732Z Traceback (most recent call last):
2021-12-08T21:31:03.8453645Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:31:03.8454335Z     result = test(self, **param_kwargs)
2021-12-08T21:31:03.8454819Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T21:31:03.8455409Z     torch.normal(input, -1, (10,))

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / test (default, 1, 2, linux.2xlarge) (6/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:29:13.8689991Z AssertionError: can only test a child process
2021-12-08T21:29:13.8589385Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:29:13.8589939Z AssertionError: can only test a child process
2021-12-08T21:29:13.8672776Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fa09abd3240>>
2021-12-08T21:29:13.8674467Z Traceback (most recent call last):
2021-12-08T21:29:13.8675775Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:29:13.8679485Z     self._shutdown_workers()
2021-12-08T21:29:13.8680554Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:29:13.8687418Z     if w.is_alive():
2021-12-08T21:29:13.8688134Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:29:13.8689232Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:29:13.8689991Z AssertionError: can only test a child process
2021-12-08T21:29:14.5319331Z ok (0.688s)
2021-12-08T21:29:14.5342017Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:29:16.2339625Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.699s)
2021-12-08T21:29:16.2415554Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.008s)
2021-12-08T21:29:17.7925191Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.551s)
2021-12-08T21:29:21.1720910Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.379s)
2021-12-08T21:29:22.0147572Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.843s)
2021-12-08T21:29:22.0181509Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:29:22.0210632Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:29:22.0237011Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really liked the macro approach as it makes everything very readable, and it's clear that they simply encapsulate TORCH_CHECKs!

I left a few minor comments, but overall this PR is looking great, thanks!

Its great to see how much cleaner the code looks by using structured kernels really.

Comment on lines +7299 to +7301
The shapes of :attr:`mean`, :attr:`std`, and :attr:`out` (if present) must
match exactly. Unless the size of :attr:`out` is zero, then it will be resized
based on the supplied arguments, which must still match.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not correct. Could it be the case that

Suggested change
The shapes of :attr:`mean`, :attr:`std`, and :attr:`out` (if present) must
match exactly. Unless the size of :attr:`out` is zero, then it will be resized
based on the supplied arguments, which must still match.
The shapes of :attr:`mean` and :attr:`std` must be broadcastable.

given that you're using infer_size in the previous PR.

I would not mention the out= argument here, as the behaviour of that argument is the same all across the PyTorch library.

Copy link
Collaborator

@lezcano lezcano Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in the example in the OP that the following now breaks:

torch.normal(torch.rand(3, 4, 5), torch.rand(3, 4, 1))

This does not seem right, as their shapes are broadcastable. We should support this behaviour.

In fact, I believe the only change needed to support this behaviour is to remove the CHECK_NORMAL_SIZES check. Note that the fact that mean and std are broadcastable is checked within at::infer_size in the meta function, so we don't need to add any additional checks for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll look into it and follow up later. wasn't sure what behavior is desirable here

Comment on lines -7302 to -7303
.. note:: When the shapes do not match, the shape of :attr:`mean`
is used as the shape for the returned output tensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have this deprecated, even if it was 4 versions later than we said we would deprecate it (cc @mruberry)

Comment on lines +160 to +165
#define CHECK_NORMAL_SIZES(output, input) \
TORCH_CHECK( \
output.sizes().equals(input.sizes()), \
"inconsistent tensor, output size (", output.sizes(), \
") is not the same as input size (", input.sizes(), ")");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this check is not necessary now, as it should be performed by the structured kernels machinery.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason i have these is because everything calls into template implementations now and these template versions still can be called by someone (e.g., see the rng tests i mentioned in the other pr: aten/src/ATen/test/cpu_rng_test.cpp). i can remove the checks if it's considered okay

Copy link
Collaborator

@lezcano lezcano Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should modify that code to call into the at::normal functions. We should only avoid calling the at:: API when the performance is crucial, and even then, one needs to be careful about that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only avoid calling the at:: API when the performance is crucial

could you elaborate on the perf bit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch has a custom dispatcher mechanism. When you call a function in the at:: namespace, this mechanism is activated and it calls the correct backend, e.g. CPU, CUDA, XLA, but also forward or backward AD. This has certain overhead, so sometimes you'll find code that calls directly to the corresponding at::native:: function. Now, most of the time, the correct thing to do is to call the at:: function, but if you are implementing your own backward AD (i.e. it's not a composite kernel) then you should be good to call at::native::. See a non-example of this where I did and shouldn't have called an at::native:: within a forward AD implementation: #63568 (comment)

Comment on lines +214 to +216
CHECK_NORMAL_SIZES(output, mean);
CHECK_NORMAL_SIZES(output, std);
CHECK_NORMAL_TENSOR_STD(std);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pretty much) no checks should happen in the _impl side of things. They already happen in the meta part, so this should probably go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Comment on lines +195 to +196
CHECK_NORMAL_SIZES(output, mean);
CHECK_NORMAL_STD(std);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should go as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Comment on lines +204 to +205
CHECK_NORMAL_SIZES(output, std);
CHECK_NORMAL_TENSOR_STD(std);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

template<template<typename> class normal_kernel, typename RNG>
Tensor& normal_impl_(Tensor& self, double mean, double std, c10::optional<Generator> gen) {
TORCH_CHECK(std >= 0.0, "normal_ expects std >= 0.0, but found std=", std);
CHECK_NORMAL_STD(std);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous PR, this check should probably go if we go down the road of having this operation implemented as a structured kernel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, but see above

@nkaretnikov
Copy link
Collaborator Author

to avoid confusion, will open a new stack to address issues related to BC and broadcasting

@facebook-github-bot facebook-github-bot deleted the gh/nkaretnikov/2/head branch January 15, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: structured kernels Related to new structured kernels functionality open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants