Adds reference vs. noncontiguous OpInfo test by mruberry · Pull Request #67434 · pytorch/pytorch

mruberry · 2021-10-28T10:09:38Z

This PR adds a new test, test_noncontigous_samples, that runs ops forward and backward and compares their outputs and grads between "normal" contiguous SampleInputs and noncontiguous SampleInputs. This test should preclude the need for noncontiguous SampleInputs going forward.

The test was added by generalizing the .numpy() transform on SampleInputs to support a new .noncontiguous() transform and copying forward/backward patterns from other tests in test_ops.py. It also discovered that many SampleInputs were incorrectly reusing tensors, so those have been revised. SampleInputs creating noncontiguous tensors for testing have also been altered to no longer do so.

In addition, this test discovered the following high priority silent correctness issues:

It also identified the following issues:

std_mean and var_mean fail test_noncontiguous_samples under ASAN #67539

The pow OpInfo also incorrectly specified that pow supported the bool datatype, and this has been fixed. Its SampleInputs were written in a way that made requests for boolean SampleInputs return type promoting inputs that never actually tried to compute pow in bool.

This PR suggests we should add the following guidance for writing SampleInputs:

ensure that all SampleInputs are independent of each other (don't reuse tensors)
ensure that all SampleInput tensors have no grad or backward functions (no autograd history) -- they should be leaves
prefer keeping sample inputs simple where possible, a good set of handwritten samples that test interesting cases may be better than an exhaustive but hard to read and maintain programmatic enumeration
keep code readable by using functools.partial and writing simple inline helpers; break up large statements into a more readable series of smaller statements; especially don't write complicated generator expressions with a for at the end!

fyi @kshitij12345 @krshrimali @pmeier @anjali411 @saketh-are @zou3519 @dagitses

pytorch-probot · 2021-10-28T10:09:41Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/52b510985c87796e29864a6d6a31770ac10939f1/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/slow-gradcheck,ciflow/all

Workflows	Labels (bold enabled)	Status
Triggered Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	✅ triggered
docker-builds	`ciflow/all`	✅ triggered
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	✅ triggered
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	✅ triggered
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	✅ triggered
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-code-analysis	`ciflow/all`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	✅ triggered
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	✅ triggered
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	✅ triggered
periodic-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	✅ triggered
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-10-28T10:09:43Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/67434
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 52b5109 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mruberry · 2021-10-28T10:17:15Z

This will still require a follow-up issue to evaluate whether noncontiguous sample inputs should be used in more contexts (like when testing autograd) and if they can be removed from existing sample input functions.

facebook-github-bot · 2021-10-29T04:48:53Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2021-10-29T04:56:38Z

-                                requires_grad=requires_grad),
-                    args=(make_tensor(second_shape, device, dtype,
-                                      requires_grad=requires_grad),)))
+        SampleInput(make_arg(first_shape).requires_grad_(requires_grad),


nit: make_arg(first_shape, requires_grad=requires_grad), partial accepts that

Nice catch -- fixed.

ngimel · 2021-10-29T04:58:55Z

-                   for tensor, idx, source, a in product([t, t_nonctg], [idx, idx_nonctg], [s, s_nonctg], [-1, 0, 2]))
+
+    samples = [SampleInput(t.detach().clone().requires_grad_(requires_grad),
+                           args=(1, idx.detach().clone(), s.detach().clone()))]


did you lose s's requires_grad here? Would autograd tests normally check all inputs that require grad?

Great catch -- yes I did on the s (source) tensor

…tiguous_tests

facebook-github-bot · 2021-10-29T08:54:24Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-10-29T10:29:17Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-10-29T11:57:37Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…h into noncontiguous_tests

facebook-github-bot · 2021-10-29T14:23:27Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-10-29T16:57:26Z

@mruberry merged this pull request in ddc9bd3.

Summary: Fixes pytorch#63341. This PR adds a new test, `test_noncontigous_samples`, that runs ops forward and backward and compares their outputs and grads between "normal" contiguous SampleInputs and noncontiguous SampleInputs. This test should preclude the need for noncontiguous SampleInputs going forward. The test was added by generalizing the `.numpy()` transform on SampleInputs to support a new `.noncontiguous()` transform and copying forward/backward patterns from other tests in test_ops.py. It also discovered that many SampleInputs were incorrectly reusing tensors, so those have been revised. SampleInputs creating noncontiguous tensors for testing have also been altered to no longer do so. In addition, this test discovered the following high priority silent correctness issues: - pytorch#67432 - pytorch#67517 - pytorch#67513 - pytorch#67512 - pytorch#67470 It also identified the following issues: - pytorch#67539 The pow OpInfo also incorrectly specified that pow supported the bool datatype, and this has been fixed. Its SampleInputs were written in a way that made requests for boolean SampleInputs return type promoting inputs that never actually tried to compute pow in bool. This PR suggests we should add the following guidance for writing SampleInputs: - ensure that all SampleInputs are independent of each other (don't reuse tensors) - ensure that all SampleInput tensors have no grad or backward functions (no autograd history) -- they should be leaves - prefer keeping sample inputs simple where possible, a good set of handwritten samples that test interesting cases may be better than an exhaustive but hard to read and maintain programmatic enumeration - keep code readable by using functools.partial and writing simple inline helpers; break up large statements into a more readable series of smaller statements; especially don't write complicated generator expressions with a `for` at the end! fyi kshitij12345 krshrimali pmeier anjali411 saketh-are zou3519 dagitses Pull Request resolved: pytorch#67434 Reviewed By: ngimel Differential Revision: D32014557 Pulled By: mruberry fbshipit-source-id: b17e19adc1d41e24441f0765af13d381fef5e3c1

noncontiguous test

7e4d5d0

pytorch-probot Bot added the ciflow/default label Oct 28, 2021

facebook-github-bot added the cla signed label Oct 28, 2021

mruberry requested a review from ngimel October 28, 2021 10:09

Expands with backward test

8a78d1e

mruberry added ciflow/all and removed ciflow/default labels Oct 29, 2021

ngimel reviewed Oct 29, 2021

View reviewed changes

ngimel approved these changes Oct 29, 2021

View reviewed changes

Mike Ruberry added 4 commits October 29, 2021 07:54

stashes

bc4a803

Merge branch 'master' of ssh://github.com/pytorch/pytorch into noncon…

33a82f9

…tiguous_tests

fixes per review and merge

354c2f1

ci lint

d247e88

asan skips + lint fix

1b7f3db

removes noqa

8c6ec61

Mike Ruberry added 2 commits October 29, 2021 14:22

skips windows

5f81b01

Merge branch 'noncontiguous_tests' of ssh://github.com/pytorch/pytorc…

52b5109

…h into noncontiguous_tests

facebook-github-bot closed this in ddc9bd3 Oct 29, 2021

facebook-github-bot added the Merged label Oct 29, 2021

crcrpar mentioned this pull request Nov 6, 2021

TestCommonCUDA.test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32 fails when TF32 is enabled #67947

Closed

mruberry deleted the noncontiguous_tests branch January 14, 2022 20:09

Conversation

mruberry commented Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-probot Bot commented Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

mruberry commented Oct 28, 2021

Uh oh!

facebook-github-bot commented Oct 29, 2021

Uh oh!

ngimel Oct 29, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry Oct 29, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 29, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry Oct 29, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 29, 2021

Uh oh!

facebook-github-bot commented Oct 29, 2021

Uh oh!

facebook-github-bot commented Oct 29, 2021

Uh oh!

facebook-github-bot commented Oct 29, 2021

Uh oh!

facebook-github-bot commented Oct 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mruberry commented Oct 28, 2021 •

edited

Loading

pytorch-probot Bot commented Oct 28, 2021 •

edited

Loading

facebook-github-bot commented Oct 28, 2021 •

edited

Loading