Port tensor variants of normal to structured kernel by nkaretnikov · Pull Request #69628 · pytorch/pytorch

nkaretnikov · 2021-12-08T20:15:43Z

Stack from ghstack:

Refactor tensor variants to use structured in native_functions.yaml
Other variants don't fit this model well, so not doing those for now
Remove some normal templates and RNG tests that relied on them.

- Refactor tensor variants to use structured in native_functions.yaml - Other variants don't fit this model well, so not doing those for now - Remove some normal templates and RNG tests that relied on them. See #69386. [ghstack-poisoned]

facebook-github-bot · 2021-12-08T20:15:48Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69628
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit bac7d8c (more details on the Dr. CI page):

11/11 failures introduced in this PR

🕵️ 11 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-bionic-cuda11.5-py3.6-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:02:33.5057326Z AssertionError: can only test a child process

2021-12-08T22:02:33.4873254Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:02:33.4874008Z AssertionError: can only test a child process
2021-12-08T22:02:33.5046535Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fdb884d5748>>
2021-12-08T22:02:33.5048108Z Traceback (most recent call last):
2021-12-08T22:02:33.5049726Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T22:02:33.5050538Z     self._shutdown_workers()
2021-12-08T22:02:33.5051562Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T22:02:33.5054300Z     if w.is_alive():
2021-12-08T22:02:33.5055445Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T22:02:33.5056577Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:02:33.5057326Z AssertionError: can only test a child process
2021-12-08T22:02:34.7206572Z ok (1.291s)
2021-12-08T22:02:34.7238032Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T22:02:37.1364761Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (2.412s)
2021-12-08T22:02:37.1477078Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.011s)
2021-12-08T22:02:39.6026962Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (2.455s)
2021-12-08T22:02:43.1441185Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:43.1446472Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:43.1462958Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:46.6106250Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:46.6122839Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

Lint / quick-checks (2/11)

Step: "Ensure correct trailing newlines" (full log | diagnosis details | 🔁 rerun)

2021-12-08T20:19:28.2137665Z python: can't open..._launches.py': [Errno 2] No such file or directory

2021-12-08T20:19:28.1817395Z ##[group]Run set -eux
2021-12-08T20:19:28.1817874Z �[36;1mset -eux�[0m
2021-12-08T20:19:28.1818566Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2021-12-08T20:19:28.1854538Z shell: /bin/bash -e {0}
2021-12-08T20:19:28.1854958Z env:
2021-12-08T20:19:28.1855534Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:19:28.1856326Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:19:28.1856982Z ##[endgroup]
2021-12-08T20:19:28.1934162Z + python torch/testing/_check_kernel_launches.py
2021-12-08T20:19:28.1937333Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2021-12-08T20:19:28.2137665Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2021-12-08T20:19:28.2213692Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2021-12-08T20:19:28.2215882Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2021-12-08T20:19:28.2248986Z shell: /bin/bash -e {0}
2021-12-08T20:19:28.2249328Z env:
2021-12-08T20:19:28.2249838Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:19:28.2250514Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:19:28.2251026Z ##[endgroup]
2021-12-08T20:19:28.2624171Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2021-12-08T20:19:28.2626385Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2021-12-08T20:19:28.2736231Z shell: /bin/bash -e {0}

linux-xenial-py3.6-gcc5.4 / test (default, 1, 2, linux.2xlarge) (3/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:27:36.1189165Z AssertionError: can only test a child process

2021-12-08T21:27:36.1075024Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:27:36.1075768Z AssertionError: can only test a child process
2021-12-08T21:27:36.1175987Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fd8670c7dd8>>
2021-12-08T21:27:36.1177641Z Traceback (most recent call last):
2021-12-08T21:27:36.1179242Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:27:36.1180100Z     self._shutdown_workers()
2021-12-08T21:27:36.1181222Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:27:36.1186316Z     if w.is_alive():
2021-12-08T21:27:36.1187102Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:27:36.1188376Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:27:36.1189165Z AssertionError: can only test a child process
2021-12-08T21:27:36.7346173Z ok (0.643s)
2021-12-08T21:27:36.7370707Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:27:38.3704828Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.633s)
2021-12-08T21:27:38.3777219Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:27:39.8858934Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.508s)
2021-12-08T21:27:43.1209404Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.235s)
2021-12-08T21:27:43.9488941Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.828s)
2021-12-08T21:27:43.9521739Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:27:43.9551152Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:27:43.9577299Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)

linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (4/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:24:26.7201959Z RuntimeError: test_torch failed!

2021-12-08T22:24:26.1126025Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestTorchDeviceTypeCUDA-20211208222250.xml
2021-12-08T22:24:26.1128921Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestVitalSignsCudaCUDA-20211208222250.xml
2021-12-08T22:24:26.5671290Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T22:24:26.5672218Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T22:24:26.5672919Z [TORCH_VITAL] CUDA.used		 true
2021-12-08T22:24:26.7195670Z Traceback (most recent call last):
2021-12-08T22:24:26.7196472Z   File "test/run_test.py", line 1058, in <module>
2021-12-08T22:24:26.7197978Z     main()
2021-12-08T22:24:26.7198443Z   File "test/run_test.py", line 1036, in main
2021-12-08T22:24:26.7201335Z     raise RuntimeError(err_message)
2021-12-08T22:24:26.7201959Z RuntimeError: test_torch failed!
2021-12-08T22:24:27.2576497Z + cleanup
2021-12-08T22:24:27.2577215Z + retcode=1
2021-12-08T22:24:27.2577680Z + set +x
2021-12-08T22:24:27.2629941Z ##[error]Process completed with exit code 1.
2021-12-08T22:24:27.2691947Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2021-12-08T22:24:27.2692996Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2021-12-08T22:24:27.2693810Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2021-12-08T22:24:27.2707867Z shell: /usr/bin/bash -e {0}
2021-12-08T22:24:27.2708283Z env:
2021-12-08T22:24:27.2720828Z   BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.6-gcc7

linux-bionic-py3.6-clang9 / test (default, 1, 2, linux.2xlarge) (5/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:26:00.2301642Z AssertionError: can only test a child process

2021-12-08T21:26:00.2087971Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:26:00.2088690Z AssertionError: can only test a child process
2021-12-08T21:26:00.2290042Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7feb8dd35748>>
2021-12-08T21:26:00.2291305Z Traceback (most recent call last):
2021-12-08T21:26:00.2292402Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:26:00.2294464Z     self._shutdown_workers()
2021-12-08T21:26:00.2295595Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:26:00.2299025Z     if w.is_alive():
2021-12-08T21:26:00.2299944Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:26:00.2300981Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:26:00.2301642Z AssertionError: can only test a child process
2021-12-08T21:26:00.9129864Z ok (0.720s)
2021-12-08T21:26:00.9154148Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:26:02.6069849Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.691s)
2021-12-08T21:26:02.6139185Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:26:04.1993444Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.585s)
2021-12-08T21:26:07.6223279Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.423s)
2021-12-08T21:26:08.5084352Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.886s)
2021-12-08T21:26:08.5118246Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:26:08.5146656Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:26:08.5171459Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)

linux-bionic-py3.6-clang9 / test (xla, 1, 1, linux.2xlarge) (6/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:03:50.4567740Z �[0;31m[ FAILED ] �[mAtenXlaTensorTest.TestKlDivBackward

2021-12-08T22:03:50.4561321Z �[0;32m[ RUN      ] �[mXlaUtilCacheTest.BasicTest
2021-12-08T22:03:50.4562135Z �[0;32m[       OK ] �[mXlaUtilCacheTest.BasicTest (0 ms)
2021-12-08T22:03:50.4562865Z �[0;32m[----------] �[m1 test from XlaUtilCacheTest (0 ms total)
2021-12-08T22:03:50.4563205Z 
2021-12-08T22:03:50.4563696Z �[0;32m[----------] �[mGlobal test environment tear-down
2021-12-08T22:03:50.4564316Z �[0;32m[==========] �[m618 tests from 8 test suites ran. (310935 ms total)
2021-12-08T22:03:50.4564829Z �[0;32m[  PASSED  ] �[m616 tests.
2021-12-08T22:03:50.4565322Z �[0;32m[  SKIPPED ] �[m1 test, listed below:
2021-12-08T22:03:50.4566160Z �[0;32m[  SKIPPED ] �[mAtenXlaTensorTest.TestGroupNormBackward
2021-12-08T22:03:50.4566983Z �[0;31m[  FAILED  ] �[m1 test, listed below:
2021-12-08T22:03:50.4567740Z �[0;31m[  FAILED  ] �[mAtenXlaTensorTest.TestKlDivBackward
2021-12-08T22:03:50.4568233Z 
2021-12-08T22:03:50.4568506Z  1 FAILED TEST
2021-12-08T22:03:50.6109824Z + cleanup
2021-12-08T22:03:50.6110259Z + retcode=1
2021-12-08T22:03:50.6110606Z + set +x
2021-12-08T22:03:50.6151025Z ##[error]Process completed with exit code 1.
2021-12-08T22:03:50.6207580Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2021-12-08T22:03:50.6208309Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2021-12-08T22:03:50.6208927Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2021-12-08T22:03:50.6233902Z shell: /usr/bin/bash -e {0}

win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (7/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:17:06.2770886Z RuntimeError: test_torch failed!

2021-12-08T23:17:06.0661606Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestTorchDeviceTypeCPU-20211208231654.xml
2021-12-08T23:17:06.0662891Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestVitalSignsCudaCPU-20211208231654.xml
2021-12-08T23:17:06.2514937Z [TORCH_VITAL] CUDA.used		 False
2021-12-08T23:17:06.2515495Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T23:17:06.2515993Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T23:17:06.2768657Z Traceback (most recent call last):
2021-12-08T23:17:06.2769366Z   File "run_test.py", line 1058, in <module>
2021-12-08T23:17:06.2769689Z     main()
2021-12-08T23:17:06.2770074Z   File "run_test.py", line 1036, in main
2021-12-08T23:17:06.2770478Z     raise RuntimeError(err_message)
2021-12-08T23:17:06.2770886Z RuntimeError: test_torch failed!
2021-12-08T23:17:06.5114392Z 
2021-12-08T23:17:06.5114970Z (base) C:\actions-runner\_work\pytorch\pytorch\test>popd
2021-12-08T23:17:06.5119200Z 
2021-12-08T23:17:06.5119722Z (base) C:\actions-runner\_work\pytorch\pytorch>if ERRORLEVEL 1 exit /b 1 
2021-12-08T23:17:06.5154225Z + cleanup
2021-12-08T23:17:06.5154838Z + retcode=1
2021-12-08T23:17:06.5155154Z + set +x
2021-12-08T23:17:06.5298588Z ##[error]Process completed with exit code 1.
2021-12-08T23:17:06.5643173Z ##[group]Run # -ir => recursive include all files in pattern
2021-12-08T23:17:06.5643842Z �[36;1m# -ir => recursive include all files in pattern�[0m

linux-xenial-py3.6-gcc7 / test (default, 2, 2, linux.2xlarge) (8/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:19:22.9638275Z AssertionError: can only test a child process

2021-12-08T21:19:22.9522159Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:19:22.9524094Z AssertionError: can only test a child process
2021-12-08T21:19:22.9628201Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fe0c183e588>>
2021-12-08T21:19:22.9629348Z Traceback (most recent call last):
2021-12-08T21:19:22.9631091Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:19:22.9631697Z     self._shutdown_workers()
2021-12-08T21:19:22.9632479Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:19:22.9634217Z     if w.is_alive():
2021-12-08T21:19:22.9635555Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:19:22.9637407Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:19:22.9638275Z AssertionError: can only test a child process
2021-12-08T21:19:23.5869978Z ok (0.652s)
2021-12-08T21:19:23.5893665Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:19:25.2281624Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.638s)
2021-12-08T21:19:25.2352331Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:19:26.7488680Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.513s)
2021-12-08T21:19:30.0143697Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.265s)
2021-12-08T21:19:30.8288048Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.814s)
2021-12-08T21:19:30.8321052Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:19:30.8349243Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:19:30.8374259Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)

win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (9/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:03:32.0491086Z RuntimeError: test_torch failed!

2021-12-08T23:03:31.8556620Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestTorchDeviceTypeCPU-20211208230319.xml
2021-12-08T23:03:31.8557867Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestVitalSignsCudaCPU-20211208230319.xml
2021-12-08T23:03:32.0263584Z [TORCH_VITAL] CUDA.used		 False
2021-12-08T23:03:32.0264076Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T23:03:32.0264572Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T23:03:32.0488729Z Traceback (most recent call last):
2021-12-08T23:03:32.0489487Z   File "run_test.py", line 1058, in <module>
2021-12-08T23:03:32.0489821Z     main()
2021-12-08T23:03:32.0490267Z   File "run_test.py", line 1036, in main
2021-12-08T23:03:32.0490667Z     raise RuntimeError(err_message)
2021-12-08T23:03:32.0491086Z RuntimeError: test_torch failed!
2021-12-08T23:03:32.2672285Z 
2021-12-08T23:03:32.2672965Z (base) C:\actions-runner\_work\pytorch\pytorch\test>popd
2021-12-08T23:03:32.2677046Z 
2021-12-08T23:03:32.2677529Z (base) C:\actions-runner\_work\pytorch\pytorch>if ERRORLEVEL 1 exit /b 1 
2021-12-08T23:03:32.2712544Z + cleanup
2021-12-08T23:03:32.2712882Z + retcode=1
2021-12-08T23:03:32.2713152Z + set +x
2021-12-08T23:03:32.2927985Z ##[error]Process completed with exit code 1.
2021-12-08T23:03:32.3231706Z ##[group]Run # -ir => recursive include all files in pattern
2021-12-08T23:03:32.3232357Z �[36;1m# -ir => recursive include all files in pattern�[0m

linux-bionic-py3.6-clang9 / test (noarch, 1, 1, linux.2xlarge) (10/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:00:18.5612454Z FAIL [0.006s]: tes...error_meta (__main__.TestRandomTensorCreationMETA)

2021-12-08T22:00:18.5195452Z   test_vstack_row_stack_meta_int16 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5257056Z   test_vstack_row_stack_meta_int32 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5319663Z   test_vstack_row_stack_meta_int64 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5381668Z   test_vstack_row_stack_meta_int8 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5443553Z   test_vstack_row_stack_meta_uint8 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5466288Z   test_zeros_dtype_out_match_meta (__main__.TestTensorCreationMETA) ... ok (0.002s)
2021-12-08T22:00:18.5524223Z   test_zeros_meta (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5609829Z   test_zeros_out_meta (__main__.TestTensorCreationMETA) ... skip (0.008s)
2021-12-08T22:00:18.5610803Z 
2021-12-08T22:00:18.5611509Z ======================================================================
2021-12-08T22:00:18.5612454Z FAIL [0.006s]: test_normal_std_error_meta (__main__.TestRandomTensorCreationMETA)
2021-12-08T22:00:18.5614096Z ----------------------------------------------------------------------
2021-12-08T22:00:18.5614915Z Traceback (most recent call last):
2021-12-08T22:00:18.5616243Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1470, in wrapper
2021-12-08T22:00:18.5617153Z     method(*args, **kwargs)
2021-12-08T22:00:18.5618479Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T22:00:18.5619590Z     result = test(self, **param_kwargs)
2021-12-08T22:00:18.5620401Z   File "test_tensor_creation_ops.py", line 3339, in test_normal_std_error
2021-12-08T22:00:18.5621148Z     torch.normal(input, std)
2021-12-08T22:00:18.5622634Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1356, in __exit__
2021-12-08T22:00:18.5623687Z     return super().__exit__(exc_type, exc_value, tb)

linux-xenial-py3.6-clang7-asan / test (default, 2, 2, linux.2xlarge) (11/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:16:09.3430606Z RuntimeError: test_torch failed!

2021-12-08T23:16:08.9326929Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestTorchDeviceTypeCPU-20211208231518.xml
2021-12-08T23:16:08.9329739Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestVitalSignsCudaCPU-20211208231518.xml
2021-12-08T23:16:09.2591638Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T23:16:09.2592229Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T23:16:09.2592765Z [TORCH_VITAL] CUDA.used		 False
2021-12-08T23:16:09.3421028Z Traceback (most recent call last):
2021-12-08T23:16:09.3421771Z   File "test/run_test.py", line 1058, in <module>
2021-12-08T23:16:09.3425929Z     main()
2021-12-08T23:16:09.3426289Z   File "test/run_test.py", line 1036, in main
2021-12-08T23:16:09.3430138Z     raise RuntimeError(err_message)
2021-12-08T23:16:09.3430606Z RuntimeError: test_torch failed!
2021-12-08T23:16:09.7228856Z + cleanup
2021-12-08T23:16:09.7229265Z + retcode=1
2021-12-08T23:16:09.7229673Z + set +x
2021-12-08T23:16:09.7264599Z ##[error]Process completed with exit code 1.
2021-12-08T23:16:09.7307672Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2021-12-08T23:16:09.7308406Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2021-12-08T23:16:09.7308994Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2021-12-08T23:16:09.7337069Z shell: /usr/bin/bash -e {0}
2021-12-08T23:16:09.7337387Z env:
2021-12-08T23:16:09.7337882Z   BUILD_ENVIRONMENT: linux-xenial-py3.6-clang7-asan

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

pytorch-probot · 2021-12-08T20:15:52Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/bac7d8c197ce6a2c9e4d561c6598507ffb08564f/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
docker-builds	`ciflow/all`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
libtorch-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

lezcano

Just left two small comments, but overall this looks good to me.

lezcano · 2021-12-09T08:13:27Z

aten/src/ATen/native/Distributions.cpp

+TORCH_META_FUNC2(normal, Tensor_Tensor) (
+  Tensor const& mean,
+  Tensor const& std,
+  c10::optional<Generator> gen
+) {


Don't we need some checks here to make sure that mean and std have compatible dtypes? Or does this operation work with arbitrary dtypes?

Oh, I see that the checks are currently done within the normal_out_impl function.
Those checks should be moved to the TORCH_META_FUNCs, which is what you do in the next PR. For this PR to stand on its own, I perhaps we could merge the next PR into this one and submit both of them as one?

the reason the checks are in templates is because those are used by RNG tests. so i'm not sure it's a good idea to remove them if they can be accessed other than via the structured api. or is it the caller's responsibility to ensure they are calling the right thing (see aten/src/ATen/test/cpu_rng_test.cpp)?

lezcano · 2021-12-09T08:18:00Z

aten/src/ATen/test/cpu_rng_test.cpp

  m.impl("normal.Tensor_float_out",  normal_Tensor_float_out);
  m.impl("normal.float_Tensor_out",  normal_float_Tensor_out);
  m.impl("normal.Tensor_Tensor_out", normal_Tensor_Tensor_out);
-  m.impl("normal.Tensor_float",      normal_Tensor_float);
-  m.impl("normal.float_Tensor",      normal_float_Tensor);
-  m.impl("normal.Tensor_Tensor",     normal_Tensor_Tensor);


How come just half of them were deleted? In fact, is this necessary given that now these operations are implemented as structured kernels? cc @ysiraichi @peterbell10

it's because these previously used the template version directly (to pass a custom rng for testing and demo this functionality). not sure what to do here since these are gone now

I think the distribution templates are part of the public generator API, so can't be removed. e.g. the cryptographic PRNG uses them:
https://github.com/pytorch/csprng/blob/5a6d9458c142190d5d713744687434c73c06ad01/torchcsprng/csrc/kernels_body.inc#L257

What do you think we should do here @mruberry ? Should we do the checks twice in these functions?
See #69628 (comment) for context.

I think, ideally, we'd want those checks in the META function. One way out of this is to factor out the implementation (code after the type checks) into a new function. Then, we would have something like this:

normal_impl_out: dtype checks and calls normal_impl_impl_out (not a very good name)

normal_impl_impl_out: executes the rest of the implementation

Then, the IMPL function can just call normal_impl_impl_out directly, which would bypass dtype checks (these can be factored into a function of its own, and called in META, too).

Not sure whether the extra indirection is worth it, though.

@bdhirsh will take a look soon -- I think he's the best person to help answer this question

cc @pbelevich (who I think wrote the rng api).

It looks like the templates are public API to help write out-of-tree kernel extensions for distribution ops, so we can't get easily rid of them (without finding all external usages and making them structured too, which... would require external codegen and doesn't seem super beneficial to do). There's also more context described here.

If that's right, then I don't think that porting normal_* ops to structured will really help to clean up much code - we have to keep all of the functional/inplace/out= template variants around. @ysiraichi is also right, you'd need to make sure that all of the error checking logic currently in the template is run in the meta function (and also directly in the template, since out-of-tree kernel writers still need to rely on them).

Given all of that, it sounds to me like it would be easiest to just directly write meta kernels for all of the distribution ops.

@bdhirsh

Given all of that, it sounds to me like it would be easiest to just directly write meta kernels for all of the distribution ops.

I'm going to do this and will create a new stack with the changes. This stack will stay open for now for reference.

I'll also fix the broadcasting issue that I introduced, which breaks BC.

lezcano · 2021-12-09T08:39:12Z

aten/src/ATen/native/native_functions.yaml

@@ -7768,28 +7768,28 @@
    Meta: normal_meta_


Why is this one not ported to structured kernels as well? I reckon that we should have all the combinations of functions here (in-place / out-place / _out) for all the types of inputs (Tensor / float) for (mean / std), right?

i'll look into it and follow up later. at first, it looked like it wasn't possible for some reason. maybe i just got confused

ysiraichi · 2021-12-11T10:40:13Z

aten/src/ATen/native/Distributions.cpp

+  Tensor const& std,
+  c10::optional<Generator> gen
+) {
+  auto shape = at::infer_size(mean.sizes(), std.sizes());


Just so we don't forget: probably we want to do something like what resize_output_for_normal does here, inside META. Since we still have the same problem as the dtype checks, we should probably wait for Brian.

ysiraichi · 2021-12-11T10:42:38Z

aten/src/ATen/native/Distributions.cpp

+  at::native::templates::normal_out_impl<NormalStub, Generator>(
+    const_cast<Tensor&>(out), mean, std, gen);


I think it's a good idea to propagate the const to normal_out_impl functions (might be BC-breaking, not sure), instead of const_cast-ing.

fwiw, this just mimics what some other operator does already, but yeah, i agree.

nkaretnikov · 2021-12-15T20:45:45Z

to avoid confusion, will open a new stack to address issues related to BC and broadcasting

nkaretnikov requested a review from ezyang as a code owner December 8, 2021 20:15

facebook-github-bot added the cla signed label Dec 8, 2021

pytorch-probot bot added the ciflow/default label Dec 8, 2021

This was referenced Dec 8, 2021

Unify checks for normal #69629

Closed

Fix a typo: add a missing space #69630

Closed

Bug fix: allow std 0 in the meta definition of normal_ #69631

Closed

Add meta variants to float variants of normal #69632

Closed

pytorchbot added the open source label Dec 8, 2021

nkaretnikov added the module: structured kernels Related to new structured kernels functionality label Dec 8, 2021

lezcano reviewed Dec 9, 2021

View reviewed changes

mruberry requested a review from bdhirsh December 10, 2021 19:23

ysiraichi reviewed Dec 11, 2021

View reviewed changes

nkaretnikov mentioned this pull request Dec 15, 2021

Port normal to structured kernel #69386

Open

nkaretnikov closed this Dec 15, 2021

facebook-github-bot deleted the gh/nkaretnikov/1/head branch January 15, 2022 15:16

		at::native::templates::normal_out_impl<NormalStub, Generator>(
		const_cast<Tensor&>(out), mean, std, gen);

Conversation

nkaretnikov commented Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 11 new failures recognized by patterns

linux-bionic-cuda11.5-py3.6-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/11)

Lint / quick-checks (2/11)

linux-xenial-py3.6-gcc5.4 / test (default, 1, 2, linux.2xlarge) (3/11)

linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (4/11)

linux-bionic-py3.6-clang9 / test (default, 1, 2, linux.2xlarge) (5/11)

linux-bionic-py3.6-clang9 / test (xla, 1, 1, linux.2xlarge) (6/11)

win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (7/11)

linux-xenial-py3.6-gcc7 / test (default, 2, 2, linux.2xlarge) (8/11)

win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (9/11)

linux-bionic-py3.6-clang9 / test (noarch, 1, 1, linux.2xlarge) (10/11)

linux-xenial-py3.6-clang7-asan / test (default, 2, 2, linux.2xlarge) (11/11)

Uh oh!

pytorch-probot bot commented Dec 8, 2021

⚛️ CI Flow

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lezcano Dec 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdhirsh Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkaretnikov Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkaretnikov commented Dec 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

nkaretnikov commented Dec 8, 2021 •

edited

Loading

facebook-github-bot commented Dec 8, 2021 •

edited

Loading

lezcano Dec 9, 2021 •

edited

Loading

bdhirsh Dec 13, 2021 •

edited

Loading

nkaretnikov Dec 13, 2021 •

edited

Loading