optests improvements based on torchvision usage on nms by ezyang · Pull Request #108929 · pytorch/pytorch

ezyang · 2023-09-09T00:54:43Z

Stack from ghstack (oldest at bottom):

-> optests improvements based on torchvision usage on nms #108929

Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now
return an unbacked SymInt. We always accept this as equal to whatever
the real value was.
Relax test so it works on all classes, not just unittest.TestCase
Properly wrap the original method, so things like
pytree.mark.parametrize are carried over
Support dynamic shapes by default for make_fx tracing_mode="fake" without symbolifying everything else

Signed-off-by: Edward Z. Yang ezyang@meta.com

- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

pytorch-bot · 2023-09-09T00:54:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108929

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 3 Unrelated Failures

As of commit 77e14cf with merge base a46df6e ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 1aee977 Pull Request resolved: #108929

- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: fde16df Pull Request resolved: #108929

zou3519 · 2023-09-11T14:28:29Z

torch/_prims_common/__init__.py

+            # TODO: We should check that the symbols are consistent
+            # with each other
+            if isinstance(y, torch.SymInt):
+                continue


What's the idea here?

If I run a data-dependent op, I will end up with an unbacked symint on the fake tensor. I cannot directly compare the unbacked symint with the real symint. So I just say "well, whatever, if the fake value is symint, let's just assume it's fine." It's definitely possible to do a more strict check but I was lazy.

zou3519 · 2023-09-11T14:29:16Z

torch/_subclasses/fake_utils.py

        ):
            try:
-                with FakeTensorMode() as fake_mode:
+                # enable_python_dispatcher() here


Is this a TODO? Or are you saying that FakeTensorMode automatically applies enable_python_dispatcher()?

whoops sorry, is a TODO

torch/testing/_internal/custom_op_db.py

zou3519

Code looks good, test failures might be real (did we change intentionally change the behavior of make_fx(tracing_mode=fake) on dynamic output shape operations?)

ezyang · 2023-09-11T15:21:52Z

Code looks good, test failures might be real (did we change intentionally change the behavior of make_fx(tracing_mode=fake) on dynamic output shape operations?)

So, I am finding that I have to fix latent bugs now that I have enabled data-dependent output shapes.

The main issue is that dynamic_only=True doesn't work anymore: previously, we would just test if an exception was raised, but now no exception is raised so the test as written doesn't actually make sense. Along the way, it seems the meta for NMS in testing isn't right lol.

- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over - Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: f43afb9 Pull Request resolved: #108929

ezyang · 2023-09-13T01:41:28Z

@pytorchbot merge

pytorchmergebot · 2023-09-13T01:43:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-13T02:18:44Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-docs / build-docs-python-false

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

ezyang · 2023-09-13T03:42:24Z

@pytorchbot merge -r

pytorchmergebot · 2023-09-13T03:45:55Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-09-13T03:46:01Z

Tried to rebase and push PR #108929, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

pytorchmergebot · 2023-09-13T03:47:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-13T03:47:14Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-docs / build-docs-python-false

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

ezyang · 2023-09-13T13:24:18Z

@pytorchbot merge -f "known master breakage"

pytorchmergebot · 2023-09-13T13:26:07Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…when generating optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

…optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

…enerating optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

…enerating optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Companion torchvision PR which uses this at pytorch/vision#7961 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Companion torchvision PR which uses this at pytorch/vision#7961 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Companion torchvision PR which uses this at pytorch/vision#7961 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #108936 Approved by: https://github.com/zou3519

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, bdhirsh, miladm, voznesenskym and wconstab September 9, 2023 00:54

github-actions bot added the ciflow/inductor label Sep 9, 2023

ezyang requested a review from zou3519 September 9, 2023 01:03

ezyang mentioned this pull request Sep 9, 2023

Run only one pytest parametrization when generating optest #108936

Closed

ezyang mentioned this pull request Sep 10, 2023

Meta implementation for nms pytorch/vision#7944

Merged

pytorch-bot bot added the release notes: fx release notes category label Sep 11, 2023

zou3519 reviewed Sep 11, 2023

View reviewed changes

torch/testing/_internal/custom_op_db.py Show resolved Hide resolved

zou3519 reviewed Sep 11, 2023

View reviewed changes

pytorchmergebot removed the merging label Sep 13, 2023

pytorchmergebot added the merging label Sep 13, 2023

pytorchmergebot removed the merging label Sep 13, 2023

pytorchmergebot added the merging label Sep 13, 2023

pytorchmergebot removed the merging label Sep 13, 2023

pytorchmergebot added the merging label Sep 13, 2023

pytorchmergebot added Merged and removed merging labels Sep 13, 2023

pytorchmergebot closed this in 55f956f Sep 13, 2023

facebook-github-bot deleted the gh/ezyang/2333/head branch September 16, 2023 14:22

Conversation

ezyang commented Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108929

⏳ 1 Pending, 3 Unrelated Failures

Uh oh!

zou3519 Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

zou3519 Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Sep 11, 2023

Uh oh!

ezyang commented Sep 13, 2023

Uh oh!

pytorchmergebot commented Sep 13, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 13, 2023

Merge failed

Uh oh!

ezyang commented Sep 13, 2023

Uh oh!

pytorchmergebot commented Sep 13, 2023

Uh oh!

pytorchmergebot commented Sep 13, 2023

Uh oh!

pytorchmergebot commented Sep 13, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 13, 2023

Merge failed

Uh oh!

ezyang commented Sep 13, 2023

Uh oh!

pytorchmergebot commented Sep 13, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ezyang commented Sep 9, 2023 •

edited

Loading

pytorch-bot bot commented Sep 9, 2023 •

edited

Loading

zou3519 Sep 11, 2023 •

edited

Loading