Skip to content

Port tensor variants of normal to structured kernel#69628

Closed
nkaretnikov wants to merge 1 commit intogh/nkaretnikov/1/basefrom
gh/nkaretnikov/1/head
Closed

Port tensor variants of normal to structured kernel#69628
nkaretnikov wants to merge 1 commit intogh/nkaretnikov/1/basefrom
gh/nkaretnikov/1/head

Conversation

@nkaretnikov
Copy link
Collaborator

@nkaretnikov nkaretnikov commented Dec 8, 2021

Stack from ghstack:

  • Refactor tensor variants to use structured in native_functions.yaml
  • Other variants don't fit this model well, so not doing those for now
  • Remove some normal templates and RNG tests that relied on them.

See #69386.

- Refactor tensor variants to use structured in native_functions.yaml
- Other variants don't fit this model well, so not doing those for now
- Remove some normal templates and RNG tests that relied on them.

See #69386.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 8, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit bac7d8c (more details on the Dr. CI page):


  • 11/11 failures introduced in this PR

🕵️ 11 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-bionic-cuda11.5-py3.6-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:02:33.5057326Z AssertionError: can only test a child process
2021-12-08T22:02:33.4873254Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:02:33.4874008Z AssertionError: can only test a child process
2021-12-08T22:02:33.5046535Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fdb884d5748>>
2021-12-08T22:02:33.5048108Z Traceback (most recent call last):
2021-12-08T22:02:33.5049726Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T22:02:33.5050538Z     self._shutdown_workers()
2021-12-08T22:02:33.5051562Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T22:02:33.5054300Z     if w.is_alive():
2021-12-08T22:02:33.5055445Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T22:02:33.5056577Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T22:02:33.5057326Z AssertionError: can only test a child process
2021-12-08T22:02:34.7206572Z ok (1.291s)
2021-12-08T22:02:34.7238032Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T22:02:37.1364761Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (2.412s)
2021-12-08T22:02:37.1477078Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.011s)
2021-12-08T22:02:39.6026962Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (2.455s)
2021-12-08T22:02:43.1441185Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:43.1446472Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:43.1462958Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:46.6106250Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2021-12-08T22:02:46.6122839Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

See GitHub Actions build Lint / quick-checks (2/11)

Step: "Ensure correct trailing newlines" (full log | diagnosis details | 🔁 rerun)

2021-12-08T20:19:28.2137665Z python: can't open..._launches.py': [Errno 2] No such file or directory
2021-12-08T20:19:28.1817395Z ##[group]Run set -eux
2021-12-08T20:19:28.1817874Z �[36;1mset -eux�[0m
2021-12-08T20:19:28.1818566Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2021-12-08T20:19:28.1854538Z shell: /bin/bash -e {0}
2021-12-08T20:19:28.1854958Z env:
2021-12-08T20:19:28.1855534Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:19:28.1856326Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:19:28.1856982Z ##[endgroup]
2021-12-08T20:19:28.1934162Z + python torch/testing/_check_kernel_launches.py
2021-12-08T20:19:28.1937333Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2021-12-08T20:19:28.2137665Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2021-12-08T20:19:28.2213692Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2021-12-08T20:19:28.2215882Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2021-12-08T20:19:28.2248986Z shell: /bin/bash -e {0}
2021-12-08T20:19:28.2249328Z env:
2021-12-08T20:19:28.2249838Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-12-08T20:19:28.2250514Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-12-08T20:19:28.2251026Z ##[endgroup]
2021-12-08T20:19:28.2624171Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2021-12-08T20:19:28.2626385Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2021-12-08T20:19:28.2736231Z shell: /bin/bash -e {0}

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / test (default, 1, 2, linux.2xlarge) (3/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:27:36.1189165Z AssertionError: can only test a child process
2021-12-08T21:27:36.1075024Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:27:36.1075768Z AssertionError: can only test a child process
2021-12-08T21:27:36.1175987Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fd8670c7dd8>>
2021-12-08T21:27:36.1177641Z Traceback (most recent call last):
2021-12-08T21:27:36.1179242Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:27:36.1180100Z     self._shutdown_workers()
2021-12-08T21:27:36.1181222Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:27:36.1186316Z     if w.is_alive():
2021-12-08T21:27:36.1187102Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:27:36.1188376Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:27:36.1189165Z AssertionError: can only test a child process
2021-12-08T21:27:36.7346173Z ok (0.643s)
2021-12-08T21:27:36.7370707Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:27:38.3704828Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.633s)
2021-12-08T21:27:38.3777219Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:27:39.8858934Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.508s)
2021-12-08T21:27:43.1209404Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.235s)
2021-12-08T21:27:43.9488941Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.828s)
2021-12-08T21:27:43.9521739Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:27:43.9551152Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:27:43.9577299Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (4/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:24:26.7201959Z RuntimeError: test_torch failed!
2021-12-08T22:24:26.1126025Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestTorchDeviceTypeCUDA-20211208222250.xml
2021-12-08T22:24:26.1128921Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestVitalSignsCudaCUDA-20211208222250.xml
2021-12-08T22:24:26.5671290Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T22:24:26.5672218Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T22:24:26.5672919Z [TORCH_VITAL] CUDA.used		 true
2021-12-08T22:24:26.7195670Z Traceback (most recent call last):
2021-12-08T22:24:26.7196472Z   File "test/run_test.py", line 1058, in <module>
2021-12-08T22:24:26.7197978Z     main()
2021-12-08T22:24:26.7198443Z   File "test/run_test.py", line 1036, in main
2021-12-08T22:24:26.7201335Z     raise RuntimeError(err_message)
2021-12-08T22:24:26.7201959Z RuntimeError: test_torch failed!
2021-12-08T22:24:27.2576497Z + cleanup
2021-12-08T22:24:27.2577215Z + retcode=1
2021-12-08T22:24:27.2577680Z + set +x
2021-12-08T22:24:27.2629941Z ##[error]Process completed with exit code 1.
2021-12-08T22:24:27.2691947Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2021-12-08T22:24:27.2692996Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2021-12-08T22:24:27.2693810Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2021-12-08T22:24:27.2707867Z shell: /usr/bin/bash -e {0}
2021-12-08T22:24:27.2708283Z env:
2021-12-08T22:24:27.2720828Z   BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.6-gcc7

See GitHub Actions build linux-bionic-py3.6-clang9 / test (default, 1, 2, linux.2xlarge) (5/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:26:00.2301642Z AssertionError: can only test a child process
2021-12-08T21:26:00.2087971Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:26:00.2088690Z AssertionError: can only test a child process
2021-12-08T21:26:00.2290042Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7feb8dd35748>>
2021-12-08T21:26:00.2291305Z Traceback (most recent call last):
2021-12-08T21:26:00.2292402Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:26:00.2294464Z     self._shutdown_workers()
2021-12-08T21:26:00.2295595Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:26:00.2299025Z     if w.is_alive():
2021-12-08T21:26:00.2299944Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:26:00.2300981Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:26:00.2301642Z AssertionError: can only test a child process
2021-12-08T21:26:00.9129864Z ok (0.720s)
2021-12-08T21:26:00.9154148Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:26:02.6069849Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.691s)
2021-12-08T21:26:02.6139185Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:26:04.1993444Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.585s)
2021-12-08T21:26:07.6223279Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.423s)
2021-12-08T21:26:08.5084352Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.886s)
2021-12-08T21:26:08.5118246Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:26:08.5146656Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:26:08.5171459Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)

See GitHub Actions build linux-bionic-py3.6-clang9 / test (xla, 1, 1, linux.2xlarge) (6/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:03:50.4567740Z �[0;31m[ FAILED ] �[mAtenXlaTensorTest.TestKlDivBackward
2021-12-08T22:03:50.4561321Z �[0;32m[ RUN      ] �[mXlaUtilCacheTest.BasicTest
2021-12-08T22:03:50.4562135Z �[0;32m[       OK ] �[mXlaUtilCacheTest.BasicTest (0 ms)
2021-12-08T22:03:50.4562865Z �[0;32m[----------] �[m1 test from XlaUtilCacheTest (0 ms total)
2021-12-08T22:03:50.4563205Z 
2021-12-08T22:03:50.4563696Z �[0;32m[----------] �[mGlobal test environment tear-down
2021-12-08T22:03:50.4564316Z �[0;32m[==========] �[m618 tests from 8 test suites ran. (310935 ms total)
2021-12-08T22:03:50.4564829Z �[0;32m[  PASSED  ] �[m616 tests.
2021-12-08T22:03:50.4565322Z �[0;32m[  SKIPPED ] �[m1 test, listed below:
2021-12-08T22:03:50.4566160Z �[0;32m[  SKIPPED ] �[mAtenXlaTensorTest.TestGroupNormBackward
2021-12-08T22:03:50.4566983Z �[0;31m[  FAILED  ] �[m1 test, listed below:
2021-12-08T22:03:50.4567740Z �[0;31m[  FAILED  ] �[mAtenXlaTensorTest.TestKlDivBackward
2021-12-08T22:03:50.4568233Z 
2021-12-08T22:03:50.4568506Z  1 FAILED TEST
2021-12-08T22:03:50.6109824Z + cleanup
2021-12-08T22:03:50.6110259Z + retcode=1
2021-12-08T22:03:50.6110606Z + set +x
2021-12-08T22:03:50.6151025Z ##[error]Process completed with exit code 1.
2021-12-08T22:03:50.6207580Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2021-12-08T22:03:50.6208309Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2021-12-08T22:03:50.6208927Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2021-12-08T22:03:50.6233902Z shell: /usr/bin/bash -e {0}

See GitHub Actions build win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (7/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:17:06.2770886Z RuntimeError: test_torch failed!
2021-12-08T23:17:06.0661606Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestTorchDeviceTypeCPU-20211208231654.xml
2021-12-08T23:17:06.0662891Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestVitalSignsCudaCPU-20211208231654.xml
2021-12-08T23:17:06.2514937Z [TORCH_VITAL] CUDA.used		 False
2021-12-08T23:17:06.2515495Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T23:17:06.2515993Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T23:17:06.2768657Z Traceback (most recent call last):
2021-12-08T23:17:06.2769366Z   File "run_test.py", line 1058, in <module>
2021-12-08T23:17:06.2769689Z     main()
2021-12-08T23:17:06.2770074Z   File "run_test.py", line 1036, in main
2021-12-08T23:17:06.2770478Z     raise RuntimeError(err_message)
2021-12-08T23:17:06.2770886Z RuntimeError: test_torch failed!
2021-12-08T23:17:06.5114392Z 
2021-12-08T23:17:06.5114970Z (base) C:\actions-runner\_work\pytorch\pytorch\test>popd
2021-12-08T23:17:06.5119200Z 
2021-12-08T23:17:06.5119722Z (base) C:\actions-runner\_work\pytorch\pytorch>if ERRORLEVEL 1 exit /b 1 
2021-12-08T23:17:06.5154225Z + cleanup
2021-12-08T23:17:06.5154838Z + retcode=1
2021-12-08T23:17:06.5155154Z + set +x
2021-12-08T23:17:06.5298588Z ##[error]Process completed with exit code 1.
2021-12-08T23:17:06.5643173Z ##[group]Run # -ir => recursive include all files in pattern
2021-12-08T23:17:06.5643842Z �[36;1m# -ir => recursive include all files in pattern�[0m

See GitHub Actions build linux-xenial-py3.6-gcc7 / test (default, 2, 2, linux.2xlarge) (8/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T21:19:22.9638275Z AssertionError: can only test a child process
2021-12-08T21:19:22.9522159Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:19:22.9524094Z AssertionError: can only test a child process
2021-12-08T21:19:22.9628201Z Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fe0c183e588>>
2021-12-08T21:19:22.9629348Z Traceback (most recent call last):
2021-12-08T21:19:22.9631091Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
2021-12-08T21:19:22.9631697Z     self._shutdown_workers()
2021-12-08T21:19:22.9632479Z   File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
2021-12-08T21:19:22.9634217Z     if w.is_alive():
2021-12-08T21:19:22.9635555Z   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 134, in is_alive
2021-12-08T21:19:22.9637407Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2021-12-08T21:19:22.9638275Z AssertionError: can only test a child process
2021-12-08T21:19:23.5869978Z ok (0.652s)
2021-12-08T21:19:23.5893665Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2021-12-08T21:19:25.2281624Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... ok (1.638s)
2021-12-08T21:19:25.2352331Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2021-12-08T21:19:26.7488680Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... ok (1.513s)
2021-12-08T21:19:30.0143697Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... ok (3.265s)
2021-12-08T21:19:30.8288048Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (0.814s)
2021-12-08T21:19:30.8321052Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:19:30.8349243Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2021-12-08T21:19:30.8374259Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)

See GitHub Actions build win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (9/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:03:32.0491086Z RuntimeError: test_torch failed!
2021-12-08T23:03:31.8556620Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestTorchDeviceTypeCPU-20211208230319.xml
2021-12-08T23:03:31.8557867Z Generated XML report: test-reports\dist-gloo\test_torch\TEST-TestVitalSignsCudaCPU-20211208230319.xml
2021-12-08T23:03:32.0263584Z [TORCH_VITAL] CUDA.used		 False
2021-12-08T23:03:32.0264076Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T23:03:32.0264572Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T23:03:32.0488729Z Traceback (most recent call last):
2021-12-08T23:03:32.0489487Z   File "run_test.py", line 1058, in <module>
2021-12-08T23:03:32.0489821Z     main()
2021-12-08T23:03:32.0490267Z   File "run_test.py", line 1036, in main
2021-12-08T23:03:32.0490667Z     raise RuntimeError(err_message)
2021-12-08T23:03:32.0491086Z RuntimeError: test_torch failed!
2021-12-08T23:03:32.2672285Z 
2021-12-08T23:03:32.2672965Z (base) C:\actions-runner\_work\pytorch\pytorch\test>popd
2021-12-08T23:03:32.2677046Z 
2021-12-08T23:03:32.2677529Z (base) C:\actions-runner\_work\pytorch\pytorch>if ERRORLEVEL 1 exit /b 1 
2021-12-08T23:03:32.2712544Z + cleanup
2021-12-08T23:03:32.2712882Z + retcode=1
2021-12-08T23:03:32.2713152Z + set +x
2021-12-08T23:03:32.2927985Z ##[error]Process completed with exit code 1.
2021-12-08T23:03:32.3231706Z ##[group]Run # -ir => recursive include all files in pattern
2021-12-08T23:03:32.3232357Z �[36;1m# -ir => recursive include all files in pattern�[0m

See GitHub Actions build linux-bionic-py3.6-clang9 / test (noarch, 1, 1, linux.2xlarge) (10/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T22:00:18.5612454Z FAIL [0.006s]: tes...error_meta (__main__.TestRandomTensorCreationMETA)
2021-12-08T22:00:18.5195452Z   test_vstack_row_stack_meta_int16 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5257056Z   test_vstack_row_stack_meta_int32 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5319663Z   test_vstack_row_stack_meta_int64 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5381668Z   test_vstack_row_stack_meta_int8 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5443553Z   test_vstack_row_stack_meta_uint8 (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5466288Z   test_zeros_dtype_out_match_meta (__main__.TestTensorCreationMETA) ... ok (0.002s)
2021-12-08T22:00:18.5524223Z   test_zeros_meta (__main__.TestTensorCreationMETA) ... skip (0.006s)
2021-12-08T22:00:18.5609829Z   test_zeros_out_meta (__main__.TestTensorCreationMETA) ... skip (0.008s)
2021-12-08T22:00:18.5610803Z 
2021-12-08T22:00:18.5611509Z ======================================================================
2021-12-08T22:00:18.5612454Z FAIL [0.006s]: test_normal_std_error_meta (__main__.TestRandomTensorCreationMETA)
2021-12-08T22:00:18.5614096Z ----------------------------------------------------------------------
2021-12-08T22:00:18.5614915Z Traceback (most recent call last):
2021-12-08T22:00:18.5616243Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1470, in wrapper
2021-12-08T22:00:18.5617153Z     method(*args, **kwargs)
2021-12-08T22:00:18.5618479Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2021-12-08T22:00:18.5619590Z     result = test(self, **param_kwargs)
2021-12-08T22:00:18.5620401Z   File "test_tensor_creation_ops.py", line 3339, in test_normal_std_error
2021-12-08T22:00:18.5621148Z     torch.normal(input, std)
2021-12-08T22:00:18.5622634Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1356, in __exit__
2021-12-08T22:00:18.5623687Z     return super().__exit__(exc_type, exc_value, tb)

See GitHub Actions build linux-xenial-py3.6-clang7-asan / test (default, 2, 2, linux.2xlarge) (11/11)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-12-08T23:16:09.3430606Z RuntimeError: test_torch failed!
2021-12-08T23:16:08.9326929Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestTorchDeviceTypeCPU-20211208231518.xml
2021-12-08T23:16:08.9329739Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestVitalSignsCudaCPU-20211208231518.xml
2021-12-08T23:16:09.2591638Z [TORCH_VITAL] Dataloader.enabled		 True
2021-12-08T23:16:09.2592229Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2021-12-08T23:16:09.2592765Z [TORCH_VITAL] CUDA.used		 False
2021-12-08T23:16:09.3421028Z Traceback (most recent call last):
2021-12-08T23:16:09.3421771Z   File "test/run_test.py", line 1058, in <module>
2021-12-08T23:16:09.3425929Z     main()
2021-12-08T23:16:09.3426289Z   File "test/run_test.py", line 1036, in main
2021-12-08T23:16:09.3430138Z     raise RuntimeError(err_message)
2021-12-08T23:16:09.3430606Z RuntimeError: test_torch failed!
2021-12-08T23:16:09.7228856Z + cleanup
2021-12-08T23:16:09.7229265Z + retcode=1
2021-12-08T23:16:09.7229673Z + set +x
2021-12-08T23:16:09.7264599Z ##[error]Process completed with exit code 1.
2021-12-08T23:16:09.7307672Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2021-12-08T23:16:09.7308406Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2021-12-08T23:16:09.7308994Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2021-12-08T23:16:09.7337069Z shell: /usr/bin/bash -e {0}
2021-12-08T23:16:09.7337387Z env:
2021-12-08T23:16:09.7337882Z   BUILD_ENVIRONMENT: linux-xenial-py3.6-clang7-asan

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@pytorch-probot
Copy link

pytorch-probot bot commented Dec 8, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/bac7d8c197ce6a2c9e4d561c6598507ffb08564f/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left two small comments, but overall this looks good to me.

Comment on lines +601 to +605
TORCH_META_FUNC2(normal, Tensor_Tensor) (
Tensor const& mean,
Tensor const& std,
c10::optional<Generator> gen
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need some checks here to make sure that mean and std have compatible dtypes? Or does this operation work with arbitrary dtypes?

Copy link
Collaborator

@lezcano lezcano Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see that the checks are currently done within the normal_out_impl function.
Those checks should be moved to the TORCH_META_FUNCs, which is what you do in the next PR. For this PR to stand on its own, I perhaps we could merge the next PR into this one and submit both of them as one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason the checks are in templates is because those are used by RNG tests. so i'm not sure it's a good idea to remove them if they can be accessed other than via the structured api. or is it the caller's responsibility to ensure they are calling the right thing (see aten/src/ATen/test/cpu_rng_test.cpp)?

Comment on lines 127 to -144
m.impl("normal.Tensor_float_out", normal_Tensor_float_out);
m.impl("normal.float_Tensor_out", normal_float_Tensor_out);
m.impl("normal.Tensor_Tensor_out", normal_Tensor_Tensor_out);
m.impl("normal.Tensor_float", normal_Tensor_float);
m.impl("normal.float_Tensor", normal_float_Tensor);
m.impl("normal.Tensor_Tensor", normal_Tensor_Tensor);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come just half of them were deleted? In fact, is this necessary given that now these operations are implemented as structured kernels? cc @ysiraichi @peterbell10

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because these previously used the template version directly (to pass a custom rng for testing and demo this functionality). not sure what to do here since these are gone now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the distribution templates are part of the public generator API, so can't be removed. e.g. the cryptographic PRNG uses them:
https://github.com/pytorch/csprng/blob/5a6d9458c142190d5d713744687434c73c06ad01/torchcsprng/csrc/kernels_body.inc#L257

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think we should do here @mruberry ? Should we do the checks twice in these functions?
See #69628 (comment) for context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, ideally, we'd want those checks in the META function. One way out of this is to factor out the implementation (code after the type checks) into a new function. Then, we would have something like this:

  • normal_impl_out: dtype checks and calls normal_impl_impl_out (not a very good name)
  • normal_impl_impl_out: executes the rest of the implementation

Then, the IMPL function can just call normal_impl_impl_out directly, which would bypass dtype checks (these can be factored into a function of its own, and called in META, too).

Not sure whether the extra indirection is worth it, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdhirsh will take a look soon -- I think he's the best person to help answer this question

Copy link
Collaborator

@bdhirsh bdhirsh Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pbelevich (who I think wrote the rng api).

It looks like the templates are public API to help write out-of-tree kernel extensions for distribution ops, so we can't get easily rid of them (without finding all external usages and making them structured too, which... would require external codegen and doesn't seem super beneficial to do). There's also more context described here.

If that's right, then I don't think that porting normal_* ops to structured will really help to clean up much code - we have to keep all of the functional/inplace/out= template variants around. @ysiraichi is also right, you'd need to make sure that all of the error checking logic currently in the template is run in the meta function (and also directly in the template, since out-of-tree kernel writers still need to rely on them).

Given all of that, it sounds to me like it would be easiest to just directly write meta kernels for all of the distribution ops.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdhirsh

Given all of that, it sounds to me like it would be easiest to just directly write meta kernels for all of the distribution ops.

I'm going to do this and will create a new stack with the changes. This stack will stay open for now for reference.

I'll also fix the broadcasting issue that I introduced, which breaks BC.

@@ -7768,28 +7768,28 @@
Meta: normal_meta_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one not ported to structured kernels as well? I reckon that we should have all the combinations of functions here (in-place / out-place / _out) for all the types of inputs (Tensor / float) for (mean / std), right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll look into it and follow up later. at first, it looked like it wasn't possible for some reason. maybe i just got confused

@mruberry mruberry requested a review from bdhirsh December 10, 2021 19:23
Tensor const& std,
c10::optional<Generator> gen
) {
auto shape = at::infer_size(mean.sizes(), std.sizes());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so we don't forget: probably we want to do something like what resize_output_for_normal does here, inside META. Since we still have the same problem as the dtype checks, we should probably wait for Brian.

Comment on lines +622 to +623
at::native::templates::normal_out_impl<NormalStub, Generator>(
const_cast<Tensor&>(out), mean, std, gen);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to propagate the const to normal_out_impl functions (might be BC-breaking, not sure), instead of const_cast-ing.

Copy link
Collaborator Author

@nkaretnikov nkaretnikov Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, this just mimics what some other operator does already, but yeah, i agree.

@nkaretnikov
Copy link
Collaborator Author

to avoid confusion, will open a new stack to address issues related to BC and broadcasting

@facebook-github-bot facebook-github-bot deleted the gh/nkaretnikov/1/head branch January 15, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: structured kernels Related to new structured kernels functionality open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants