[primTorch] Implement batch, group, and instance norm references by rdspring1 · Pull Request #81191 · pytorch/pytorch

rdspring1 · 2022-07-11T00:31:56Z

Add References:

batch norm
group norm
instance norm

Depends on:

cc @ezyang @mruberry @ngimel @lezcano @fdrocha @peterbell10

facebook-github-bot · 2022-07-11T00:32:02Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81191
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 5 New Failures

As of commit 393c6c4 (more details on the Dr. CI page):

Expand to see more

5/5 failures introduced in this PR

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

macos-12-py3-x86-64-test-1-2-default (1/5)

Step: "Test" (full log | diagnosis details)

RuntimeError: test_ops failed!

=========================== short test summary info ============================
FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
= 1 failed, 17280 passed, 6250 skipped, 186 xfailed, 266 warnings in 1569.62s (0:26:09) =
Skip info is located in the xml test reports, please either go to s3 or the hud to download them
Traceback (most recent call last):
  File "test/run_test.py", line 1065, in <module>
    main()
  File "test/run_test.py", line 1043, in main
    raise RuntimeError(err_message)
RuntimeError: test_ops failed!


Exited with code exit status 1

pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (2/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T07:31:55.1363406Z RuntimeError: test_ops failed!

2022-08-30T07:31:52.8925708Z =========================== short test summary info ============================
2022-08-30T07:31:52.8926165Z FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
2022-08-30T07:31:52.8927800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-30T07:31:52.8994938Z = 1 failed, 17277 passed, 6251 skipped, 188 xfailed, 311 warnings in 1054.02s (0:17:34) =
2022-08-30T07:31:53.2056655Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-30T07:31:55.1358909Z Traceback (most recent call last):
2022-08-30T07:31:55.1359208Z   File "test/run_test.py", line 1065, in <module>
2022-08-30T07:31:55.1360975Z     main()
2022-08-30T07:31:55.1361384Z   File "test/run_test.py", line 1043, in main
2022-08-30T07:31:55.1362986Z     raise RuntimeError(err_message)
2022-08-30T07:31:55.1363406Z RuntimeError: test_ops failed!
2022-08-30T07:31:55.4192843Z 
2022-08-30T07:31:55.4193154Z real	17m41.866s
2022-08-30T07:31:55.4193394Z user	113m2.804s
2022-08-30T07:31:55.4194966Z sys	11m17.971s
2022-08-30T07:31:55.4234634Z ##[error]Process completed with exit code 1.
2022-08-30T07:31:55.4274211Z Prepare all required actions
2022-08-30T07:31:55.4274514Z Getting action download info
2022-08-30T07:31:55.6992601Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T07:31:55.6992814Z with:
2022-08-30T07:31:55.6993155Z   github-token: ***

pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge) (3/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T07:32:36.7438807Z RuntimeError: test_ops failed!

2022-08-30T07:32:34.5655614Z =========================== short test summary info ============================
2022-08-30T07:32:34.5656027Z FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
2022-08-30T07:32:34.5657831Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-30T07:32:34.5721088Z = 1 failed, 17277 passed, 6251 skipped, 188 xfailed, 307 warnings in 1079.09s (0:17:59) =
2022-08-30T07:32:34.8815341Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-30T07:32:36.7434151Z Traceback (most recent call last):
2022-08-30T07:32:36.7434434Z   File "test/run_test.py", line 1065, in <module>
2022-08-30T07:32:36.7436228Z     main()
2022-08-30T07:32:36.7436501Z   File "test/run_test.py", line 1043, in main
2022-08-30T07:32:36.7438522Z     raise RuntimeError(err_message)
2022-08-30T07:32:36.7438807Z RuntimeError: test_ops failed!
2022-08-30T07:32:37.0377417Z 
2022-08-30T07:32:37.0377720Z real	18m7.547s
2022-08-30T07:32:37.0378003Z user	118m10.981s
2022-08-30T07:32:37.0380331Z sys	10m36.879s
2022-08-30T07:32:37.0418249Z ##[error]Process completed with exit code 1.
2022-08-30T07:32:37.0458071Z Prepare all required actions
2022-08-30T07:32:37.0458363Z Getting action download info
2022-08-30T07:32:37.2762245Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T07:32:37.2762468Z with:
2022-08-30T07:32:37.2762795Z   github-token: ***

pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (4/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T08:39:43.2976344Z �[0;31m[ FAILED ] �[mCuda.TestRand01_CUDA

2022-08-30T08:39:43.2044845Z �[0;32m[ RUN      ] �[mLLVM.CustomTarget
2022-08-30T08:39:43.2348764Z �[0;32m[       OK ] �[mLLVM.CustomTarget (30 ms)
2022-08-30T08:39:43.2349175Z �[0;32m[ RUN      ] �[mLLVM.CodeGenKernelFuncName
2022-08-30T08:39:43.2824304Z �[0;32m[       OK ] �[mLLVM.CodeGenKernelFuncName (47 ms)
2022-08-30T08:39:43.2824775Z �[0;32m[----------] �[m150 tests from LLVM (10255 ms total)
2022-08-30T08:39:43.2824982Z 
2022-08-30T08:39:43.2825218Z �[0;32m[----------] �[mGlobal test environment tear-down
2022-08-30T08:39:43.2975258Z �[0;32m[==========] �[m827 tests from 26 test suites ran. (44558 ms total)
2022-08-30T08:39:43.2975624Z �[0;32m[  PASSED  ] �[m826 tests.
2022-08-30T08:39:43.2975980Z �[0;31m[  FAILED  ] �[m1 test, listed below:
2022-08-30T08:39:43.2976344Z �[0;31m[  FAILED  ] �[mCuda.TestRand01_CUDA
2022-08-30T08:39:43.2976541Z 
2022-08-30T08:39:43.2976636Z  1 FAILED TEST
2022-08-30T08:39:43.2976943Z �[0;33m  YOU HAVE 5 DISABLED TESTS
2022-08-30T08:39:43.2977115Z 
2022-08-30T08:39:43.4563516Z �[m
2022-08-30T08:39:43.4583412Z ##[error]Process completed with exit code 1.
2022-08-30T08:39:43.4626863Z Prepare all required actions
2022-08-30T08:39:43.4627222Z Getting action download info
2022-08-30T08:39:43.6332507Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T08:39:43.6332816Z with:

pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (5/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T07:35:35.6378175Z RuntimeError: test_ops failed!

2022-08-30T07:35:33.4021154Z =========================== short test summary info ============================
2022-08-30T07:35:33.4025154Z FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
2022-08-30T07:35:33.4025775Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-30T07:35:33.4092580Z = 1 failed, 17277 passed, 6251 skipped, 188 xfailed, 310 warnings in 1266.95s (0:21:06) =
2022-08-30T07:35:33.6987627Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-30T07:35:35.6373378Z Traceback (most recent call last):
2022-08-30T07:35:35.6373793Z   File "test/run_test.py", line 1065, in <module>
2022-08-30T07:35:35.6375876Z     main()
2022-08-30T07:35:35.6376284Z   File "test/run_test.py", line 1043, in main
2022-08-30T07:35:35.6377789Z     raise RuntimeError(err_message)
2022-08-30T07:35:35.6378175Z RuntimeError: test_ops failed!
2022-08-30T07:35:35.9646506Z 
2022-08-30T07:35:35.9646785Z real	21m15.815s
2022-08-30T07:35:35.9647147Z user	139m12.810s
2022-08-30T07:35:35.9647442Z sys	13m17.436s
2022-08-30T07:35:35.9687898Z ##[error]Process completed with exit code 1.
2022-08-30T07:35:35.9741462Z Prepare all required actions
2022-08-30T07:35:35.9741774Z Getting action download info
2022-08-30T07:35:36.3347450Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T07:35:36.3347663Z with:
2022-08-30T07:35:36.3347978Z   github-token: ***

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

torch/_refs/__init__.py

…ization

torch/_refs/__init__.py

…ization

Use prim.squeeze which supports axis list argument

…ization

pytorch-bot · 2022-09-13T18:22:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81191

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 Failures, 1 Pending

As of commit ac0598c:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mruberry · 2022-09-26T20:55:47Z

torch/_refs/__init__.py

    return flip(a, (0,))


+def _repeat_if_defined(a: Optional[Tensor], sizes: int):


Add a comment for this (private) function

mruberry · 2022-09-27T19:50:39Z

aten/src/ATen/native/Normalization.cpp

    double momentum, double eps) {

  using accscalar_t = at::acc_type<scalar_t, false>;
+  TORCH_CHECK(input.dim() >= 1,


Since batch_norm_cpu_update_stats_template is not a user-facing function this should be an internal assertion -- maybe the check could occur earlier?

mruberry · 2022-09-27T19:51:49Z

aten/src/ATen/native/Normalization.cpp

    auto out = input.clone();
-    if (weight.defined()) out = out * weight[0];
-    if (bias.defined()) out = out + bias[0];
+    if (weight.defined() && weight.numel() > 0) {


This will need a comment for what the semantics are when weight and/or bias have no elements

mruberry · 2022-09-27T19:53:35Z

aten/src/ATen/native/Normalization.cpp

+    auto num_features = self.sizes()[1];
+    auto options = self.options().dtype(
+        at::toAccumulateType(self.scalar_type(), /*is_cuda=*/self.is_cuda()));
+    auto save_mean = at::empty({num_features}, options);


Is it really OK to return tensors with random values here?

…ization

This PR adds nvFuser's implementation for batch_norm as there's no reference yet (#81191) and no in-place copy support (#84545). Pull Request resolved: #85562 Approved by: https://github.com/kevinstephano, https://github.com/ngimel

…ization

facebook-github-bot · 2022-10-04T00:56:06Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T00:56:58Z

The committers listed above are authorized under a signed CLA.

✅ login: rdspring1 / name: Ryan Spring (a9999f4, e5632fc, 21c8b3b, f5c655f, 1462a68, 34a4f9e, cc365e3, 802484c, d81416b, adb8d17, 45d3081, 60813f3, 1335af9, 8c242b2, 8965e3f, c1c6f2b, 593d128, bf69225, 6b02e6f, 393c6c4, 180eb51, 2dba31e, f3485c2, e473fda, dca27ca, 83ead54, b9914de, 2a417c5, ac3c57b, 10c9092, 83254a8, 1f0f5e2, 4edad59)

…ization

mruberry · 2022-10-14T16:35:48Z

torch/testing/_internal/common_methods_invocations.py

        ((S, S, S), {'training': True, 'momentum': 0.5, 'eps': 0.6}),
        ((3, 2, 4), {'training': False, 'momentum': -1.2}),
        ((3, 1), {'training': True, 'momentum': 0.0}),
-        ((0,), {'training': True}),


Should these be added to instance and groupnorm sample inputs, too?

mruberry

Updates from offline review:

let's separate the changes to existing operators and ensure those are tested properly
also review #85960 and see if that should be part of the PR changing the ATen operations
after the C++ changes to existing ATen operators, let's revisit the references

This PR adds nvFuser's implementation for batch_norm as there's no reference yet (pytorch/pytorch#81191) and no in-place copy support (pytorch/pytorch#84545). Pull Request resolved: pytorch/pytorch#85562 Approved by: https://github.com/kevinstephano, https://github.com/ngimel

Add group norm reference Split from #81191 Pull Request resolved: #87054 Approved by: https://github.com/mruberry

Add group norm reference Split from pytorch#81191 Pull Request resolved: pytorch#87054 Approved by: https://github.com/mruberry

github-actions · 2022-12-13T16:39:47Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

rdspring1 added 4 commits July 10, 2022 19:30

Initial batch_norm impl

a9999f4

skip bfloat16, float16, nvfuser-executor

e5632fc

Fix size-0 input handling

21c8b3b

Initial instance_norm impl

f5c655f

facebook-github-bot added the cla signed label Jul 11, 2022

pytorchbot added the open source label Jul 11, 2022

Initial group_norm implementation

1462a68

rdspring1 changed the title ~~[primTorch] Implement batch and instance norm references~~ [primTorch] Implement batch, group, and instance norm references Jul 11, 2022

Chillee reviewed Jul 12, 2022

View reviewed changes

torch/_refs/__init__.py Outdated Show resolved Hide resolved

rdspring1 added 8 commits July 13, 2022 19:01

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

34a4f9e

…ization

Add repeat reference - squashed from PR pytorch#81374

cc365e3

replace copy_ function with copy_to

802484c

Use primtorch decomposition to pass tests

d81416b

Add decomposition exception

adb8d17

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

45d3081

…ization

Fix mean and var handling

60813f3

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

1335af9

…ization

IvanYashchuk added the module: primTorch label Jul 26, 2022

rdspring1 added 3 commits August 22, 2022 18:51

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

8c242b2

…ization

update cpu inference shape and device for save_mean and save_rstd

8965e3f

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

c1c6f2b

…ization

IvanYashchuk reviewed Aug 27, 2022

View reviewed changes

torch/_refs/__init__.py Outdated Show resolved Hide resolved

IvanYashchuk reviewed Aug 27, 2022

View reviewed changes

torch/_refs/__init__.py Outdated Show resolved Hide resolved

rdspring1 added 5 commits August 29, 2022 17:48

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

593d128

…ization

Fix dtype cast issue with mean and var during batch norm inference

bf69225

Use prim.squeeze which supports axis list argument

fixes

6b02e6f

misc

393c6c4

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

180eb51

…ization

rdspring1 marked this pull request as ready for review September 23, 2022 02:03

rdspring1 requested review from mruberry and ngimel as code owners September 23, 2022 02:03

IvanYashchuk mentioned this pull request Sep 23, 2022

Add nvFuser support for torch.native_batch_norm #85562

Closed

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 26, 2022

mruberry reviewed Sep 26, 2022

View reviewed changes

mruberry reviewed Sep 27, 2022

View reviewed changes

rdspring1 added 2 commits September 28, 2022 14:39

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

ac3c57b

…ization

internal assert

10c9092

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

83254a8

…ization

rdspring1 added 2 commits October 10, 2022 22:34

Merge branch 'master' of github.com:rdspring1/pytorch into ref_normal…

1f0f5e2

…ization

Use primtorch reference for aten decomposition

4edad59

fixes

ac0598c

mruberry reviewed Oct 14, 2022

View reviewed changes

This was referenced Oct 17, 2022

[primTorch] Implement group norm reference #87054

Closed

[PrimTorch] Implement batch_norm and instance_norm #87116

Closed

pytorchmergebot pushed a commit that referenced this pull request Nov 11, 2022

[primTorch] Implement group norm reference (#87054)

534ae6a

Add group norm reference Split from #81191 Pull Request resolved: #87054 Approved by: https://github.com/mruberry

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022

[primTorch] Implement group norm reference (pytorch#87054)

f230699

Add group norm reference Split from pytorch#81191 Pull Request resolved: pytorch#87054 Approved by: https://github.com/mruberry

github-actions bot added the Stale label Dec 13, 2022

github-actions bot closed this Jan 12, 2023

		return flip(a, (0,))


		def _repeat_if_defined(a: Optional[Tensor], sizes: int):

Conversation

rdspring1 commented Jul 11, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 5 New Failures

🕵️ 5 new failures recognized by patterns

macos-12-py3-x86-64-test-1-2-default (1/5)

pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (2/5)

pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge) (3/5)

pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (4/5)

pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (5/5)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81191

❌ 7 Failures, 1 Pending

Uh oh!

mruberry Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

mruberry Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

mruberry Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

mruberry Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mruberry Oct 14, 2022

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rdspring1 commented Jul 11, 2022 •

edited by pytorch-bot bot

Loading

facebook-github-bot commented Jul 11, 2022 •

edited

Loading

pytorch-bot bot commented Sep 13, 2022 •

edited

Loading

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading