optests improvements based on torchvision usage on nms#108929
optests improvements based on torchvision usage on nms#108929ezyang wants to merge 10 commits intogh/ezyang/2333/basefrom
Conversation
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108929
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 1 Pending, 3 Unrelated FailuresAs of commit 77e14cf with merge base a46df6e ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 1aee977 Pull Request resolved: #108929
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: fde16df Pull Request resolved: #108929
| # TODO: We should check that the symbols are consistent | ||
| # with each other | ||
| if isinstance(y, torch.SymInt): | ||
| continue |
There was a problem hiding this comment.
If I run a data-dependent op, I will end up with an unbacked symint on the fake tensor. I cannot directly compare the unbacked symint with the real symint. So I just say "well, whatever, if the fake value is symint, let's just assume it's fine." It's definitely possible to do a more strict check but I was lazy.
torch/_subclasses/fake_utils.py
Outdated
| ): | ||
| try: | ||
| with FakeTensorMode() as fake_mode: | ||
| # enable_python_dispatcher() here |
There was a problem hiding this comment.
Is this a TODO? Or are you saying that FakeTensorMode automatically applies enable_python_dispatcher()?
There was a problem hiding this comment.
whoops sorry, is a TODO
zou3519
left a comment
There was a problem hiding this comment.
Code looks good, test failures might be real (did we change intentionally change the behavior of make_fx(tracing_mode=fake) on dynamic output shape operations?)
So, I am finding that I have to fix latent bugs now that I have enabled data-dependent output shapes. The main issue is that |
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over - Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over - Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
- Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over Fixes #108927 Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: f43afb9 Pull Request resolved: #108929
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot merge -r |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot merge -f "known master breakage" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…when generating optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
…optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
…enerating optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, by default we only sample the first value. There are more bells and whistles we could add; for example, I could add an extra custom pytest marker to let you say "no no, please run each parametrization in the optest". I could also not do this. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
…enerating optest" Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Companion torchvision PR which uses this at pytorch/vision#7961 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Companion torchvision PR which uses this at pytorch/vision#7961 Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in #108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Companion torchvision PR which uses this at pytorch/vision#7961 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #108936 Approved by: https://github.com/zou3519
Stack from ghstack (oldest at bottom):
return an unbacked SymInt. We always accept this as equal to whatever
the real value was.
pytree.mark.parametrize are carried over
tracing_mode="fake"without symbolifying everything elseFixes #108927
Signed-off-by: Edward Z. Yang ezyang@meta.com