Enable automatic_dynamic_shapes by default#103623
Enable automatic_dynamic_shapes by default#103623ezyang wants to merge 20 commits intogh/ezyang/2191/basefrom
Conversation
Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103623
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 53e4893: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 msaroufim [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 msaroufim [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 msaroufim [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 msaroufim [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 msaroufim [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 msaroufim [ghstack-poisoned]
For safety, we don't turn this on in fbcode. Goal is to flush out bugs in OSS nightlies first. Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 msaroufim [ghstack-poisoned]
| assert not self.outputs_weakrefs | ||
| for out, static_output_tensor in zip(outputs, self.static_output_tensors): | ||
| if out is None or static_output_tensor is not None: | ||
| if not isinstance(out, torch.Tensor) or static_output_tensor is not None: |
There was a problem hiding this comment.
Maybe add test case for when we would return dynamic shape output in cudagraph trees ?
I don't think we're hitting the 3rd invocation, reconstruct output path:
pytorch/torch/_inductor/cudagraph_trees.py
Lines 918 to 963 in 072be12
There was a problem hiding this comment.
Unfortunately, I already have this test case 🤔
@config.patch({"triton.cudagraphs": True})
@dynamo_config.patch(
automatic_dynamic_shapes=True,
assume_static_by_default=False,
)
def test_dynamic_to_static_cudagraphs(self):
for b in [False, True]:
with config.patch({"triton.cudagraph_trees": b}):
@torch._dynamo.optimize("inductor")
def fn(x, y):
r = x + y
return r, r.size(0)
inputs = (
torch.randn((5, 5), device="cuda"),
torch.randn((5, 5), device="cuda"),
)
self.assertTrue(same(fn(*inputs), (inputs[0] + inputs[1], 5)))
inputs = (
torch.randn((6, 6), device="cuda"),
torch.randn((6, 6), device="cuda"),
)
self.assertTrue(same(fn(*inputs), (inputs[0] + inputs[1], 6)))
So I don't understand why this is not exercising these cases...
There was a problem hiding this comment.
I spent a while trying to understand num_fixed setting in inductor and... I give up. I'm going to argue that the yolov3 is good enough test coverage, and also these changes are "obviously OK" and we can easily keep point fixing these until we've got all the sites.
If you want to argue that we need more clear invariants about the data structures, I'm fine with this, but we should use mypy to work it out, and then apply uniform annotations all throughout cudagraph_trees.py
There was a problem hiding this comment.
I wonder if it's because we're adding the symints in the front of the calling convention, and that is messing up the num_fixed logic. https://github.com/pytorch/pytorch/blob/main/torch/_functorch/aot_autograd.py#L2990-L2992. Do we have a test case which exercises the saved symints ?
I think it's unlikely mypy is going to fix our bug here, although I can add more annotations to cudagraph_trees.
There was a problem hiding this comment.
@eellison I also thought about this, but it doesn't seem like it should be it. Input SymInt is passed in as example values, so it should not "count" as fixed.
For safety, we don't turn this on in fbcode. Goal is to flush out bugs in OSS nightlies first. Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 msaroufim [ghstack-poisoned]
For safety, we don't turn this on in fbcode. Goal is to flush out bugs in OSS nightlies first. Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 msaroufim [ghstack-poisoned]
For safety, we don't turn this on in fbcode. Goal is to flush out bugs in OSS nightlies first. Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 msaroufim [ghstack-poisoned]
Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 msaroufim [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Hi @ezyang, it seems like this PR is causing an XLA dynamo test The purpose of this unit test was to ensure that Dynamo in XLA recompiles when input shape changes. However, now it seems like Dynamo is just passing the new input shape to XLA without causing recompilation. And from the error message, we can see that Dynamo is now passing scalar to XLA as While we investigate more and try to put out a fix, would it be okay to revert this PR to prevent more related changes coming in? cc @JackCaoG |
|
For the context, Pytorch/XLA expect dynamo to pass us a list of |
|
Sorry about the delay. For XLA I would advise you to change this configuration variable to False when XLA Dynamo is being used. Are you able to easily do so? |
|
Thanks for the advice, Ed. We've temporarily disabled the |
Stack from ghstack (oldest at bottom):
Some notes:
_generatejobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loopsSigned-off-by: Edward Z. Yang ezyang@meta.com
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @anijain2305 @msaroufim