Skip to content

[primTorch] Implement batch, group, and instance norm references#81191

Closed
rdspring1 wants to merge 34 commits intopytorch:masterfrom
rdspring1:ref_normalization
Closed

[primTorch] Implement batch, group, and instance norm references#81191
rdspring1 wants to merge 34 commits intopytorch:masterfrom
rdspring1:ref_normalization

Conversation

@rdspring1
Copy link
Contributor

@rdspring1 rdspring1 commented Jul 11, 2022

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 11, 2022

🔗 Helpful links

❌ 5 New Failures

As of commit 393c6c4 (more details on the Dr. CI page):

Expand to see more
  • 5/5 failures introduced in this PR

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See CircleCI build macos-12-py3-x86-64-test-1-2-default (1/5)

Step: "Test" (full log | diagnosis details)

RuntimeError: test_ops failed!
=========================== short test summary info ============================
FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
= 1 failed, 17280 passed, 6250 skipped, 186 xfailed, 266 warnings in 1569.62s (0:26:09) =
Skip info is located in the xml test reports, please either go to s3 or the hud to download them
Traceback (most recent call last):
  File "test/run_test.py", line 1065, in <module>
    main()
  File "test/run_test.py", line 1043, in main
    raise RuntimeError(err_message)
RuntimeError: test_ops failed!


Exited with code exit status 1

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (2/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T07:31:55.1363406Z RuntimeError: test_ops failed!
2022-08-30T07:31:52.8925708Z =========================== short test summary info ============================
2022-08-30T07:31:52.8926165Z FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
2022-08-30T07:31:52.8927800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-30T07:31:52.8994938Z = 1 failed, 17277 passed, 6251 skipped, 188 xfailed, 311 warnings in 1054.02s (0:17:34) =
2022-08-30T07:31:53.2056655Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-30T07:31:55.1358909Z Traceback (most recent call last):
2022-08-30T07:31:55.1359208Z   File "test/run_test.py", line 1065, in <module>
2022-08-30T07:31:55.1360975Z     main()
2022-08-30T07:31:55.1361384Z   File "test/run_test.py", line 1043, in main
2022-08-30T07:31:55.1362986Z     raise RuntimeError(err_message)
2022-08-30T07:31:55.1363406Z RuntimeError: test_ops failed!
2022-08-30T07:31:55.4192843Z 
2022-08-30T07:31:55.4193154Z real	17m41.866s
2022-08-30T07:31:55.4193394Z user	113m2.804s
2022-08-30T07:31:55.4194966Z sys	11m17.971s
2022-08-30T07:31:55.4234634Z ##[error]Process completed with exit code 1.
2022-08-30T07:31:55.4274211Z Prepare all required actions
2022-08-30T07:31:55.4274514Z Getting action download info
2022-08-30T07:31:55.6992601Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T07:31:55.6992814Z with:
2022-08-30T07:31:55.6993155Z   github-token: ***

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge) (3/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T07:32:36.7438807Z RuntimeError: test_ops failed!
2022-08-30T07:32:34.5655614Z =========================== short test summary info ============================
2022-08-30T07:32:34.5656027Z FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
2022-08-30T07:32:34.5657831Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-30T07:32:34.5721088Z = 1 failed, 17277 passed, 6251 skipped, 188 xfailed, 307 warnings in 1079.09s (0:17:59) =
2022-08-30T07:32:34.8815341Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-30T07:32:36.7434151Z Traceback (most recent call last):
2022-08-30T07:32:36.7434434Z   File "test/run_test.py", line 1065, in <module>
2022-08-30T07:32:36.7436228Z     main()
2022-08-30T07:32:36.7436501Z   File "test/run_test.py", line 1043, in main
2022-08-30T07:32:36.7438522Z     raise RuntimeError(err_message)
2022-08-30T07:32:36.7438807Z RuntimeError: test_ops failed!
2022-08-30T07:32:37.0377417Z 
2022-08-30T07:32:37.0377720Z real	18m7.547s
2022-08-30T07:32:37.0378003Z user	118m10.981s
2022-08-30T07:32:37.0380331Z sys	10m36.879s
2022-08-30T07:32:37.0418249Z ##[error]Process completed with exit code 1.
2022-08-30T07:32:37.0458071Z Prepare all required actions
2022-08-30T07:32:37.0458363Z Getting action download info
2022-08-30T07:32:37.2762245Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T07:32:37.2762468Z with:
2022-08-30T07:32:37.2762795Z   github-token: ***

See GitHub Actions build pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (4/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T08:39:43.2976344Z �[0;31m[ FAILED ] �[mCuda.TestRand01_CUDA
2022-08-30T08:39:43.2044845Z �[0;32m[ RUN      ] �[mLLVM.CustomTarget
2022-08-30T08:39:43.2348764Z �[0;32m[       OK ] �[mLLVM.CustomTarget (30 ms)
2022-08-30T08:39:43.2349175Z �[0;32m[ RUN      ] �[mLLVM.CodeGenKernelFuncName
2022-08-30T08:39:43.2824304Z �[0;32m[       OK ] �[mLLVM.CodeGenKernelFuncName (47 ms)
2022-08-30T08:39:43.2824775Z �[0;32m[----------] �[m150 tests from LLVM (10255 ms total)
2022-08-30T08:39:43.2824982Z 
2022-08-30T08:39:43.2825218Z �[0;32m[----------] �[mGlobal test environment tear-down
2022-08-30T08:39:43.2975258Z �[0;32m[==========] �[m827 tests from 26 test suites ran. (44558 ms total)
2022-08-30T08:39:43.2975624Z �[0;32m[  PASSED  ] �[m826 tests.
2022-08-30T08:39:43.2975980Z �[0;31m[  FAILED  ] �[m1 test, listed below:
2022-08-30T08:39:43.2976344Z �[0;31m[  FAILED  ] �[mCuda.TestRand01_CUDA
2022-08-30T08:39:43.2976541Z 
2022-08-30T08:39:43.2976636Z  1 FAILED TEST
2022-08-30T08:39:43.2976943Z �[0;33m  YOU HAVE 5 DISABLED TESTS
2022-08-30T08:39:43.2977115Z 
2022-08-30T08:39:43.4563516Z �[m
2022-08-30T08:39:43.4583412Z ##[error]Process completed with exit code 1.
2022-08-30T08:39:43.4626863Z Prepare all required actions
2022-08-30T08:39:43.4627222Z Getting action download info
2022-08-30T08:39:43.6332507Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T08:39:43.6332816Z with:

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (5/5)

Step: "Test" (full log | diagnosis details)

2022-08-30T07:35:35.6378175Z RuntimeError: test_ops failed!
2022-08-30T07:35:33.4021154Z =========================== short test summary info ============================
2022-08-30T07:35:33.4025154Z FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_batch_norm_cpu
2022-08-30T07:35:33.4025775Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-30T07:35:33.4092580Z = 1 failed, 17277 passed, 6251 skipped, 188 xfailed, 310 warnings in 1266.95s (0:21:06) =
2022-08-30T07:35:33.6987627Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-30T07:35:35.6373378Z Traceback (most recent call last):
2022-08-30T07:35:35.6373793Z   File "test/run_test.py", line 1065, in <module>
2022-08-30T07:35:35.6375876Z     main()
2022-08-30T07:35:35.6376284Z   File "test/run_test.py", line 1043, in main
2022-08-30T07:35:35.6377789Z     raise RuntimeError(err_message)
2022-08-30T07:35:35.6378175Z RuntimeError: test_ops failed!
2022-08-30T07:35:35.9646506Z 
2022-08-30T07:35:35.9646785Z real	21m15.815s
2022-08-30T07:35:35.9647147Z user	139m12.810s
2022-08-30T07:35:35.9647442Z sys	13m17.436s
2022-08-30T07:35:35.9687898Z ##[error]Process completed with exit code 1.
2022-08-30T07:35:35.9741462Z Prepare all required actions
2022-08-30T07:35:35.9741774Z Getting action download info
2022-08-30T07:35:36.3347450Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-30T07:35:36.3347663Z with:
2022-08-30T07:35:36.3347978Z   github-token: ***

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@rdspring1 rdspring1 changed the title [primTorch] Implement batch and instance norm references [primTorch] Implement batch, group, and instance norm references Jul 11, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81191

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 Failures, 1 Pending

As of commit ac0598c:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@rdspring1 rdspring1 marked this pull request as ready for review September 23, 2022 02:03
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 26, 2022
return flip(a, (0,))


def _repeat_if_defined(a: Optional[Tensor], sizes: int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for this (private) function

double momentum, double eps) {

using accscalar_t = at::acc_type<scalar_t, false>;
TORCH_CHECK(input.dim() >= 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since batch_norm_cpu_update_stats_template is not a user-facing function this should be an internal assertion -- maybe the check could occur earlier?

auto out = input.clone();
if (weight.defined()) out = out * weight[0];
if (bias.defined()) out = out + bias[0];
if (weight.defined() && weight.numel() > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a comment for what the semantics are when weight and/or bias have no elements

auto num_features = self.sizes()[1];
auto options = self.options().dtype(
at::toAccumulateType(self.scalar_type(), /*is_cuda=*/self.is_cuda()));
auto save_mean = at::empty({num_features}, options);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really OK to return tensors with random values here?

pytorchmergebot pushed a commit that referenced this pull request Oct 3, 2022
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (#81191) and no in-place copy support (#84545).

Pull Request resolved: #85562
Approved by: https://github.com/kevinstephano, https://github.com/ngimel
@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

((S, S, S), {'training': True, 'momentum': 0.5, 'eps': 0.6}),
((3, 2, 4), {'training': False, 'momentum': -1.2}),
((3, 1), {'training': True, 'momentum': 0.0}),
((0,), {'training': True}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be added to instance and groupnorm sample inputs, too?

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates from offline review:

  • let's separate the changes to existing operators and ensure those are tested properly
  • also review #85960 and see if that should be part of the PR changing the ATen operations
  • after the C++ changes to existing ATen operators, let's revisit the references

jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (pytorch/pytorch#81191) and no in-place copy support (pytorch/pytorch#84545).

Pull Request resolved: pytorch/pytorch#85562
Approved by: https://github.com/kevinstephano, https://github.com/ngimel
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
This PR adds nvFuser's implementation for batch_norm as there's no reference yet (pytorch/pytorch#81191) and no in-place copy support (pytorch/pytorch#84545).

Pull Request resolved: pytorch/pytorch#85562
Approved by: https://github.com/kevinstephano, https://github.com/ngimel
pytorchmergebot pushed a commit that referenced this pull request Nov 11, 2022
Add group norm reference
Split from #81191
Pull Request resolved: #87054
Approved by: https://github.com/mruberry
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Add group norm reference
Split from pytorch#81191
Pull Request resolved: pytorch#87054
Approved by: https://github.com/mruberry
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Dec 13, 2022
@github-actions github-actions bot closed this Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: primTorch open source Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants