Skip to content

optests improvements based on torchvision usage on nms#108929

Closed
ezyang wants to merge 10 commits intogh/ezyang/2333/basefrom
gh/ezyang/2333/head
Closed

optests improvements based on torchvision usage on nms#108929
ezyang wants to merge 10 commits intogh/ezyang/2333/basefrom
gh/ezyang/2333/head

Conversation

@ezyang
Copy link
Copy Markdown
Contributor

@ezyang ezyang commented Sep 9, 2023

Stack from ghstack (oldest at bottom):

  • Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now
    return an unbacked SymInt. We always accept this as equal to whatever
    the real value was.
  • Relax test so it works on all classes, not just unittest.TestCase
  • Properly wrap the original method, so things like
    pytree.mark.parametrize are carried over
  • Support dynamic shapes by default for make_fx tracing_mode="fake" without symbolifying everything else

Fixes #108927

Signed-off-by: Edward Z. Yang ezyang@meta.com

- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Sep 9, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108929

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 3 Unrelated Failures

As of commit 77e14cf with merge base a46df6e (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ezyang ezyang requested a review from zou3519 September 9, 2023 01:03
ezyang added a commit that referenced this pull request Sep 10, 2023
…optest"


Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 10, 2023
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 1aee977
Pull Request resolved: #108929
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 11, 2023
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: fde16df
Pull Request resolved: #108929
@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Sep 11, 2023
Comment on lines +86 to +89
# TODO: We should check that the symbols are consistent
# with each other
if isinstance(y, torch.SymInt):
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the idea here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run a data-dependent op, I will end up with an unbacked symint on the fake tensor. I cannot directly compare the unbacked symint with the real symint. So I just say "well, whatever, if the fake value is symint, let's just assume it's fine." It's definitely possible to do a more strict check but I was lazy.

):
try:
with FakeTensorMode() as fake_mode:
# enable_python_dispatcher() here
Copy link
Copy Markdown
Contributor

@zou3519 zou3519 Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO? Or are you saying that FakeTensorMode automatically applies enable_python_dispatcher()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops sorry, is a TODO

Copy link
Copy Markdown
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, test failures might be real (did we change intentionally change the behavior of make_fx(tracing_mode=fake) on dynamic output shape operations?)

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Sep 11, 2023

Code looks good, test failures might be real (did we change intentionally change the behavior of make_fx(tracing_mode=fake) on dynamic output shape operations?)

So, I am finding that I have to fix latent bugs now that I have enabled data-dependent output shapes.

The main issue is that dynamic_only=True doesn't work anymore: previously, we would just test if an exception was raised, but now no exception is raised so the test as written doesn't actually make sense. Along the way, it seems the meta for NMS in testing isn't right lol.

- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over
- Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over
- Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 13, 2023
- Update cross-ref FakeMode test to use ShapeEnv.  Dynamic ops can now
  return an unbacked SymInt.  We always accept this as equal to whatever
  the real value was.
- Relax test so it works on all classes, not just unittest.TestCase
- Properly wrap the original method, so things like
  pytree.mark.parametrize are carried over

Fixes #108927

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: f43afb9
Pull Request resolved: #108929
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Sep 13, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Sep 13, 2023

@pytorchbot merge -r

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Tried to rebase and push PR #108929, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Sep 13, 2023

@pytorchbot merge -f "known master breakage"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

ezyang added a commit that referenced this pull request Sep 13, 2023
…when generating optest"


Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 13, 2023
…optest"


Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 13, 2023
…enerating optest"


Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 13, 2023
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 14, 2023
…enerating optest"


Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Companion torchvision PR which uses this at pytorch/vision#7961

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 14, 2023
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Companion torchvision PR which uses this at pytorch/vision#7961

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Sep 14, 2023
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason.

* torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s
* after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s

It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values.

This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants.

With this PR:

* reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s

Companion torchvision PR which uses this at pytorch/vision#7961

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: #108936
Approved by: https://github.com/zou3519
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/2333/head branch September 16, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: fx release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants