Skip to content

Add meta variants to float variants of normal#69632

Closed
nkaretnikov wants to merge 1 commit intogh/nkaretnikov/5/basefrom
gh/nkaretnikov/5/head
Closed

Add meta variants to float variants of normal#69632
nkaretnikov wants to merge 1 commit intogh/nkaretnikov/5/basefrom
gh/nkaretnikov/5/head

Conversation

@nkaretnikov
Copy link
Collaborator

@nkaretnikov nkaretnikov commented Dec 8, 2021

Stack from ghstack:

This mimics what normal_ does. Not sure whether TensorOptions need to
explicitly set the device type to meta for these.

Without this change, both of these variants already work with meta, but
maybe explicit definitions are somehow better (e.g., result in less
computations):

torch.normal(mean=4., std=1., size=(1, 2, 3), device='meta')
tensor(..., device='meta', size=(1, 2, 3))
torch.normal(mean=4., std=1., size=(4, 5), out=torch.rand(1, device='meta'))
tensor(..., device='meta', size=(4, 5))

This mimics what normal_ does.  Not sure whether TensorOptions need to
explicitly set the device type to meta for these.

Without this change, both of these variants already work with meta, but
maybe explicit definitions are somehow better (e.g., result in less
computations):

>>> torch.normal(mean=4., std=1., size=(1, 2, 3), device='meta')
tensor(..., device='meta', size=(1, 2, 3))
>>> torch.normal(mean=4., std=1., size=(4, 5), out=torch.rand(1, device='meta'))
tensor(..., device='meta', size=(4, 5))

[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Dec 8, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/1977986c34265c314343988f29d71f86da3bceda/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

nkaretnikov added a commit that referenced this pull request Dec 8, 2021
This mimics what normal_ does.  Not sure whether TensorOptions need to
explicitly set the device type to meta for these.

Without this change, both of these variants already work with meta, but
maybe explicit definitions are somehow better (e.g., result in less
computations):

>>> torch.normal(mean=4., std=1., size=(1, 2, 3), device='meta')
tensor(..., device='meta', size=(1, 2, 3))
>>> torch.normal(mean=4., std=1., size=(4, 5), out=torch.rand(1, device='meta'))
tensor(..., device='meta', size=(4, 5))

ghstack-source-id: 3c27c38
Pull Request resolved: #69632
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 8, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 1977986 (more details on the Dr. CI page):


  • 14/14 failures introduced in this PR

🕵️ 14 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (1/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:44:03.4544777Z FAIL [0.005s]: tes...error_cuda (__main__.TestRandomTensorCreationCUDA)
2021-12-08T22:44:03.4537300Z     raise rte
2021-12-08T22:44:03.4538291Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T22:44:03.4539183Z     result = test(self, **param_kwargs)
2021-12-08T22:44:03.4539817Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T22:44:03.4540552Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T22:44:03.4541335Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T22:44:03.4542077Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T22:44:03.4543093Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T22:44:03.4543634Z 
2021-12-08T22:44:03.4544011Z ======================================================================
2021-12-08T22:44:03.4544777Z FAIL [0.005s]: test_normal_std_error_cuda (__main__.TestRandomTensorCreationCUDA)
2021-12-08T22:44:03.4545865Z ----------------------------------------------------------------------
2021-12-08T22:44:03.4546689Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T22:44:03.4547119Z 
2021-12-08T22:44:03.4547722Z During handling of the above exception, another exception occurred:
2021-12-08T22:44:03.4548224Z 
2021-12-08T22:44:03.4548640Z Traceback (most recent call last):
2021-12-08T22:44:03.4549662Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1470, in wrapper
2021-12-08T22:44:03.4550429Z     method(*args, **kwargs)
2021-12-08T22:44:03.4551475Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T22:44:03.4552379Z     result = test(self, **param_kwargs)

See GitHub Actions build Lint / quick-checks (2/14)

Step: "Ensure correct trailing newlines" (full log | diagnosis details | 🔁 rerun)

2021-12-08T20:22:20.3758387Z python: can't open..._launches.py': [Errno 2] No such file or directory
2021-12-08T20:22:20.3197748Z ##[group]Run set -eux
2021-12-08T20:22:20.3198198Z �[36;1mset -eux�[0m
2021-12-08T20:22:20.3198897Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2021-12-08T20:22:20.3234031Z shell: /bin/bash -e {0}
2021-12-08T20:22:20.3234394Z env:
2021-12-08T20:22:20.3234927Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:22:20.3235660Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:22:20.3236183Z ##[endgroup]
2021-12-08T20:22:20.3755921Z + python torch/testing/_check_kernel_launches.py
2021-12-08T20:22:20.3756621Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2021-12-08T20:22:20.3758387Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2021-12-08T20:22:20.3792640Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2021-12-08T20:22:20.3794554Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2021-12-08T20:22:20.3830120Z shell: /bin/bash -e {0}
2021-12-08T20:22:20.3830521Z env:
2021-12-08T20:22:20.3831097Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:22:20.3831913Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:22:20.3832445Z ##[endgroup]
2021-12-08T20:22:20.4168172Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2021-12-08T20:22:20.4170120Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2021-12-08T20:22:20.4203592Z shell: /bin/bash -e {0}

See GitHub Actions build linux-bionic-cuda11.5-py3.6-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (3/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:21:06.1571572Z AssertionError: can only test a child process
2021-12-08T22:21:06.1369455Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:21:06.1370584Z AssertionError: can only test a child process
2021-12-08T22:21:06.1560569Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f5b223f5400>>
2021-12-08T22:21:06.1562196Z Traceback (most recent call last):
2021-12-08T22:21:06.1563724Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T22:21:06.1564499Z     self._shutdown_workers()
2021-12-08T22:21:06.1565565Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T22:21:06.1567744Z     if w.is_alive():
2021-12-08T22:21:06.1568436Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T22:21:06.1570779Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:21:06.1571572Z AssertionError: can only test a child process
2021-12-08T22:21:07.4950722Z ok (1.420s)
2021-12-08T22:21:07.4979057Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T22:21:09.9095392Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (2.411s)
2021-12-08T22:21:09.9204890Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.011s)
2021-12-08T22:21:12.3939878Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (2.473s)
2021-12-08T22:21:15.8778560Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:21:15.8791153Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:21:15.8823752Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:21:19.2844515Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:21:19.2846081Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

See GitHub Actions build linux-bionic-py3.6-clang9 / test (noarch, 1, 1, linux.2xlarge) (4/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:13:42.5590870Z FAIL [0.005s]: tes...error_meta (__main__.TestRandomTensorCreationMETA)
2021-12-08T22:13:42.5581661Z During handling of the above exception, another exception occurred:
2021-12-08T22:13:42.5582286Z 
2021-12-08T22:13:42.5582794Z Traceback (most recent call last):
2021-12-08T22:13:42.5584219Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T22:13:42.5585351Z     result = test(self, **param_kwargs)
2021-12-08T22:13:42.5586206Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T22:13:42.5587176Z     torch.normal(input, -1, (10,))
2021-12-08T22:13:42.5588445Z AssertionError: "normal_ expects std >= 0.0" does not match "normal expects std >= 0.0, but found std -1"
2021-12-08T22:13:42.5589436Z 
2021-12-08T22:13:42.5589860Z ======================================================================
2021-12-08T22:13:42.5590870Z FAIL [0.005s]: test_normal_std_error_meta (__main__.TestRandomTensorCreationMETA)
2021-12-08T22:13:42.5592385Z ----------------------------------------------------------------------
2021-12-08T22:13:42.5593575Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T22:13:42.5594176Z 
2021-12-08T22:13:42.5595036Z During handling of the above exception, another exception occurred:
2021-12-08T22:13:42.5595744Z 
2021-12-08T22:13:42.5596348Z Traceback (most recent call last):
2021-12-08T22:13:42.5597828Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1470, in wrapper
2021-12-08T22:13:42.5598946Z     method(*args, **kwargs)
2021-12-08T22:13:42.5600471Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T22:13:42.5601762Z     result = test(self, **param_kwargs)

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / test (default, 2, 2, linux.2xlarge) (5/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:17:06.0310573Z AssertionError: can only test a child process
2021-12-08T21:17:06.0255289Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:17:06.0256048Z AssertionError: can only test a child process
2021-12-08T21:17:06.0278856Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fd18b0cb470>>
2021-12-08T21:17:06.0280601Z Traceback (most recent call last):
2021-12-08T21:17:06.0282078Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:17:06.0283147Z     self._shutdown_workers()
2021-12-08T21:17:06.0303437Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:17:06.0306714Z     if w.is_alive():
2021-12-08T21:17:06.0308179Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:17:06.0309693Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:17:06.0310573Z AssertionError: can only test a child process
2021-12-08T21:17:06.7592547Z ok (0.749s)
2021-12-08T21:17:06.7618191Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:17:08.4524701Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.690s)
2021-12-08T21:17:08.4597581Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:17:10.0807640Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.621s)
2021-12-08T21:17:13.6023301Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.521s)
2021-12-08T21:17:14.4874203Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.885s)
2021-12-08T21:17:14.4910376Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.004s)
2021-12-08T21:17:14.4940217Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:17:14.4965324Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)

See GitHub Actions build linux-xenial-py3.6-gcc7 / test (default, 2, 2, linux.2xlarge) (6/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:34:38.5623405Z FAIL [0.004s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:34:38.5618010Z     raise rte
2021-12-08T21:34:38.5618747Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:34:38.5619415Z     result = test(self, **param_kwargs)
2021-12-08T21:34:38.5619887Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T21:34:38.5620430Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T21:34:38.5620983Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T21:34:38.5621518Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T21:34:38.5622168Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T21:34:38.5622564Z 
2021-12-08T21:34:38.5622833Z ======================================================================
2021-12-08T21:34:38.5623405Z FAIL [0.004s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:34:38.5624214Z ----------------------------------------------------------------------
2021-12-08T21:34:38.5624823Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T21:34:38.5625132Z 
2021-12-08T21:34:38.5625579Z During handling of the above exception, another exception occurred:
2021-12-08T21:34:38.5625949Z 
2021-12-08T21:34:38.5626260Z Traceback (most recent call last):
2021-12-08T21:34:38.5627082Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:34:38.5627737Z     result = test(self, **param_kwargs)
2021-12-08T21:34:38.5628229Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T21:34:38.5628779Z     torch.normal(input, -1, (10,))

See GitHub Actions build linux-xenial-py3.6-clang7-asan / test (default, 2, 2, linux.2xlarge) (7/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:13:33.2260195Z FAIL [0.007s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T23:13:33.2254988Z     raise rte
2021-12-08T23:13:33.2255714Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T23:13:33.2256355Z     result = test(self, **param_kwargs)
2021-12-08T23:13:33.2256799Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T23:13:33.2257339Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T23:13:33.2257859Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T23:13:33.2258387Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T23:13:33.2259006Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T23:13:33.2259384Z 
2021-12-08T23:13:33.2259643Z ======================================================================
2021-12-08T23:13:33.2260195Z FAIL [0.007s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T23:13:33.2260937Z ----------------------------------------------------------------------
2021-12-08T23:13:33.2261541Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T23:13:33.2261877Z 
2021-12-08T23:13:33.2262298Z During handling of the above exception, another exception occurred:
2021-12-08T23:13:33.2262659Z 
2021-12-08T23:13:33.2262977Z Traceback (most recent call last):
2021-12-08T23:13:33.2263889Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T23:13:33.2264540Z     result = test(self, **param_kwargs)
2021-12-08T23:13:33.2265006Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T23:13:33.2265561Z     torch.normal(input, -1, (10,))

See GitHub Actions build linux-bionic-py3.6-clang9 / test (default, 1, 2, linux.2xlarge) (8/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:40:16.6135177Z AssertionError: can only test a child process
2021-12-08T21:40:16.6022323Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:40:16.6023095Z AssertionError: can only test a child process
2021-12-08T21:40:16.6126082Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fadb92776d8>>
2021-12-08T21:40:16.6127825Z Traceback (most recent call last):
2021-12-08T21:40:16.6129241Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:40:16.6130125Z     self._shutdown_workers()
2021-12-08T21:40:16.6131265Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:40:16.6132148Z     if w.is_alive():
2021-12-08T21:40:16.6132891Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:40:16.6134360Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:40:16.6135177Z AssertionError: can only test a child process
2021-12-08T21:40:17.2677089Z ok (0.680s)
2021-12-08T21:40:17.2700189Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:40:18.9118430Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.641s)
2021-12-08T21:40:18.9189313Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:40:20.4478576Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.529s)
2021-12-08T21:40:23.7620637Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.314s)
2021-12-08T21:40:24.6380976Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.876s)
2021-12-08T21:40:24.6414108Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:40:24.6443315Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:40:24.6468813Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)

See GitHub Actions build linux-bionic-py3.6-clang9 / test (default, 2, 2, linux.2xlarge) (9/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:43:16.4125494Z FAIL [0.005s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:43:16.4119788Z     raise rte
2021-12-08T21:43:16.4120588Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:43:16.4121295Z     result = test(self, **param_kwargs)
2021-12-08T21:43:16.4121779Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T21:43:16.4122378Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T21:43:16.4122946Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T21:43:16.4123525Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T21:43:16.4124194Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T21:43:16.4124612Z 
2021-12-08T21:43:16.4124903Z ======================================================================
2021-12-08T21:43:16.4125494Z FAIL [0.005s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:43:16.4126317Z ----------------------------------------------------------------------
2021-12-08T21:43:16.4126970Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T21:43:16.4127295Z 
2021-12-08T21:43:16.4127753Z During handling of the above exception, another exception occurred:
2021-12-08T21:43:16.4128142Z 
2021-12-08T21:43:16.4128464Z Traceback (most recent call last):
2021-12-08T21:43:16.4129335Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:43:16.4130039Z     result = test(self, **param_kwargs)
2021-12-08T21:43:16.4130544Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T21:43:16.4131137Z     torch.normal(input, -1, (10,))

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / test (default, 1, 2, linux.2xlarge) (10/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:37:09.3023609Z FAIL [0.004s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:37:09.3017752Z     raise rte
2021-12-08T21:37:09.3018528Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:37:09.3019195Z     result = test(self, **param_kwargs)
2021-12-08T21:37:09.3019673Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T21:37:09.3020473Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T21:37:09.3021055Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T21:37:09.3021651Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T21:37:09.3022308Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T21:37:09.3022720Z 
2021-12-08T21:37:09.3023001Z ======================================================================
2021-12-08T21:37:09.3023609Z FAIL [0.004s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T21:37:09.3024430Z ----------------------------------------------------------------------
2021-12-08T21:37:09.3025064Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T21:37:09.3025389Z 
2021-12-08T21:37:09.3025845Z During handling of the above exception, another exception occurred:
2021-12-08T21:37:09.3026221Z 
2021-12-08T21:37:09.3026540Z Traceback (most recent call last):
2021-12-08T21:37:09.3027387Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T21:37:09.3028061Z     result = test(self, **param_kwargs)
2021-12-08T21:37:09.3028561Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T21:37:09.3029140Z     torch.normal(input, -1, (10,))

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (11/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:18:49.2267906Z AssertionError: can only test a child process
2021-12-08T22:18:49.2121433Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:18:49.2122679Z AssertionError: can only test a child process
2021-12-08T22:18:49.2257802Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f4d95d1b208>>
2021-12-08T22:18:49.2259729Z Traceback (most recent call last):
2021-12-08T22:18:49.2261535Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T22:18:49.2262404Z     self._shutdown_workers()
2021-12-08T22:18:49.2263847Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T22:18:49.2265389Z     if w.is_alive():
2021-12-08T22:18:49.2266072Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T22:18:49.2267157Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:18:49.2267906Z AssertionError: can only test a child process
2021-12-08T22:18:50.3831900Z ok (1.219s)
2021-12-08T22:18:50.3862526Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T22:18:52.7342239Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (2.348s)
2021-12-08T22:18:52.7451204Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.011s)
2021-12-08T22:18:55.1381338Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (2.393s)
2021-12-08T22:18:58.6037751Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:18:58.6063803Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:18:58.6065229Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:19:02.0540121Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:19:02.0547963Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

See GitHub Actions build win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (12/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:23:34.2187594Z FAIL [0.016s]: tes...d_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T23:23:34.2181570Z     raise rte
2021-12-08T23:23:34.2182301Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 376, in instantiated_test
2021-12-08T23:23:34.2182972Z     result = test(self, **param_kwargs)
2021-12-08T23:23:34.2183630Z   File "test_tensor_creation_ops.py", line 3327, in test_normal
2021-12-08T23:23:34.2184180Z     helper(self, device, dtype, lambda x: x, lambda t: t, lambda mean: mean)
2021-12-08T23:23:34.2185223Z   File "test_tensor_creation_ops.py", line 3267, in helper
2021-12-08T23:23:34.2185811Z     out = torch.normal(mean=torch.empty((0, 2)), std=torch.empty((0, 1)))
2021-12-08T23:23:34.2186431Z RuntimeError: inconsistent tensor, output size ([0, 2]) is not the same as input size ([0, 1])
2021-12-08T23:23:34.2186805Z 
2021-12-08T23:23:34.2187053Z ======================================================================
2021-12-08T23:23:34.2187594Z FAIL [0.016s]: test_normal_std_error_cpu (__main__.TestRandomTensorCreationCPU)
2021-12-08T23:23:34.2188215Z ----------------------------------------------------------------------
2021-12-08T23:23:34.2188700Z RuntimeError: normal expects std >= 0.0, but found std -1
2021-12-08T23:23:34.2188994Z 
2021-12-08T23:23:34.2189406Z During handling of the above exception, another exception occurred:
2021-12-08T23:23:34.2189760Z 
2021-12-08T23:23:34.2190057Z Traceback (most recent call last):
2021-12-08T23:23:34.2190849Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 376, in instantiated_test
2021-12-08T23:23:34.2191544Z     result = test(self, **param_kwargs)
2021-12-08T23:23:34.2192094Z   File "test_tensor_creation_ops.py", line 3336, in test_normal_std_error
2021-12-08T23:23:34.2192539Z     torch.normal(input, -1, (10,))

See GitHub Actions build win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (13/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:48:22.0392516Z RuntimeError: test_torch failed!
2021-12-08T22:48:21.7878159Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestTorchDeviceTypeCPU-20211208224809.xml
2021-12-08T22:48:21.7879449Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestVitalSignsCudaCPU-20211208224809.xml
2021-12-08T22:48:22.0160915Z [TORCH_VITAL] CUDA.used		 False
2021-12-08T22:48:22.0161494Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T22:48:22.0162006Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T22:48:22.0390051Z Traceback (most recent call last):
2021-12-08T22:48:22.0390917Z   File "run_test.py", line 1058, in <module>
2021-12-08T22:48:22.0391253Z     main()
2021-12-08T22:48:22.0391678Z   File "run_test.py", line 1036, in main
2021-12-08T22:48:22.0392115Z     raise RuntimeError(err_message)
2021-12-08T22:48:22.0392516Z RuntimeError: test_torch failed!
2021-12-08T22:48:22.2952884Z 
2021-12-08T22:48:22.2953598Z (base) C:\actions-runner\_work\pytorch\pytorch\test>popd
2021-12-08T22:48:22.2957837Z 
2021-12-08T22:48:22.2958379Z (base) C:\actions-runner\_work\pytorch\pytorch>if ERRORLEVEL 1 exit /b 1 
2021-12-08T22:48:22.2993397Z + cleanup
2021-12-08T22:48:22.2993816Z + retcode=1
2021-12-08T22:48:22.2994075Z + set +x
2021-12-08T22:48:22.3162059Z ##[error]Process completed with exit code 1.
2021-12-08T22:48:22.3508358Z ##[group]Run # -ir => recursive include all files in pattern
2021-12-08T22:48:22.3509058Z �[36;1m# -ir => recursive include all files in pattern�[0m

See GitHub Actions build linux-bionic-py3.6-clang9 / test (xla, 1, 1, linux.2xlarge) (14/14)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:10:01.9513242Z �[0;31m[ FAILED ] �[mAtenXlaTensorTest.TestKlDivBackward
2021-12-08T22:10:01.9502825Z �[0;32m[ RUN      ] �[mXlaUtilCacheTest.BasicTest
2021-12-08T22:10:01.9504034Z �[0;32m[       OK ] �[mXlaUtilCacheTest.BasicTest (0 ms)
2021-12-08T22:10:01.9505200Z �[0;32m[----------] �[m1 test from XlaUtilCacheTest (0 ms total)
2021-12-08T22:10:01.9505730Z 
2021-12-08T22:10:01.9506678Z �[0;32m[----------] �[mGlobal test environment tear-down
2021-12-08T22:10:01.9507649Z �[0;32m[==========] �[m618 tests from 8 test suites ran. (296724 ms total)
2021-12-08T22:10:01.9508656Z �[0;32m[  PASSED  ] �[m616 tests.
2021-12-08T22:10:01.9509381Z �[0;32m[  SKIPPED ] �[m1 test, listed below:
2021-12-08T22:10:01.9510733Z �[0;32m[  SKIPPED ] �[mAtenXlaTensorTest.TestGroupNormBackward
2021-12-08T22:10:01.9512036Z �[0;31m[  FAILED  ] �[m1 test, listed below:
2021-12-08T22:10:01.9513242Z �[0;31m[  FAILED  ] �[mAtenXlaTensorTest.TestKlDivBackward
2021-12-08T22:10:01.9514054Z 
2021-12-08T22:10:01.9514464Z  1 FAILED TEST
2021-12-08T22:10:02.1555338Z + cleanup
2021-12-08T22:10:02.1555670Z + retcode=1
2021-12-08T22:10:02.1556050Z + set +x
2021-12-08T22:10:02.1595244Z ##[error]Process completed with exit code 1.
2021-12-08T22:10:02.1676846Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2021-12-08T22:10:02.1677591Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2021-12-08T22:10:02.1678211Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2021-12-08T22:10:02.4782703Z shell: /usr/bin/bash -e {0}

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@nkaretnikov nkaretnikov added the module: structured kernels Related to new structured kernels functionality label Dec 8, 2021
@nkaretnikov
Copy link
Collaborator Author

hey @pbelevich, i put you in reviewers because this touches/undoes some of the work you did related to RNG: #35167 (see the previous commits in this stack)

@nkaretnikov
Copy link
Collaborator Author

nkaretnikov commented Dec 8, 2021

this is my first structured kernel port. so if something doesn't make sense, it's probably because i don't know what i'm doing.

some notes related to this pr: #69386

in particular:

things i'm not sure about:

  • the topmost commit (meta def's for float variants): seems redundant/useless now, but maybe it's a good thing since it still splits the code into meta/impl (similar to the inplace normal_ variant)
  • there shouldn't be any checks/resizes in out variants (according to the docs) because meta functions are supposed to do that, but i erred on the side of caution.

warning: you shouldn't rely on my tests/judgement since it's likely that i messed something up. but i did my best to uncover any potential issues related to BC/shapes. i made no effort to test RNG-related things since i didn't really touch them.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we shouldn't port these to structured kernels?

@nkaretnikov
Copy link
Collaborator Author

Any reason why we shouldn't port these to structured kernels?

@lezcano well, i'm not sure you can do better than what i just did here (might be wrong tho). the signatures of these are completely different (differ more than just by the out param). it's because one of them needs to construct a tensor and hence requires a bunch of memory-allocation-related params. and there's currently no way to have this expressed via structured/structured_delegate.

@lezcano
Copy link
Collaborator

lezcano commented Dec 9, 2021

I see how these are different to the others. Even then, you are given the options and the shape you need to pass to set_output, so you should be able to write it as their own structured kernel, right?

@nkaretnikov
Copy link
Collaborator Author

@lezcano

I see how these are different to the others. Even then, you are given the options and the shape you need to pass to set_output, so you should be able to write it as their own structured kernel, right?

i'm not sure what you mean exactly. specifically, i don't understand this part:

so you should be able to write it as their own structured kernel, right?

the problem i'm describing is this:

- func: normal.float_float(float mean, float std, int[] size, *, Generator? generator=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor

- func: normal.float_float_out(float mean, float std, int[] size, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!)

to use structured, the signatures need to be:

f1(foo, bar, baz, * qux, out) -> ...  // out variant -- defined as structured
f2(foo, bar, baz, * qux) -> ...  // functional variant -- defined as structured delegate

that is, they must only differ by the out param.

so the only way to make this work is either introducing a bunch of redundant optional memory-allocation parameters to the out variant, but that would be bad since allocation is already determined by the passed tensors

OR

you could add more functions to the api, but that would blow up the number of operators for no good reason, no? they would offer no new functionality and would just be there to satisfy the current parser/codegen machinery.

lmk if i misunderstood your point

@nkaretnikov
Copy link
Collaborator Author

i'd also like to point out that these already work with meta as is. could it be that this one is not even necessary? maybe it would be nice to have for later (as it offers more control). but it also leads to code bloat

Copy link
Collaborator

@ysiraichi ysiraichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nkaretnikov.
Good job! I think we can avoid duplicating normal_out. Check out my comment, and let me know if it's unclear.

return result; // similar to normal_meta_
}

Tensor& normal_out(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that both normal_out and normal_out_meta do the same thing. You could dispatch CPU, CUDA, and META to the same function.

@nkaretnikov
Copy link
Collaborator Author

to avoid confusion, will open a new stack to address issues related to BC and broadcasting

@facebook-github-bot facebook-github-bot deleted the gh/nkaretnikov/5/head branch January 15, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: structured kernels Related to new structured kernels functionality open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants