[dynamo] handle unbacked SymInt data in Tensor.new_tensor#176390
[dynamo] handle unbacked SymInt data in Tensor.new_tensor#176390vvvdwbvvv wants to merge 9 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176390
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 2 Unrelated FailuresAs of commit e1a4bb2 with merge base 7643509 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Lucaskabela
left a comment
There was a problem hiding this comment.
Before we are ready to review we need to add a unit test (the repro is a good starting point; given this is quite a hot path we may also want to think about a few other paths that might cause this graph break and test those as well)
|
Please re request review once ready |
|
@Lucaskabela Thanks for replying. Yes I am writing test for it. I am also surveying whether there are better way to fix it. If you have better implementations for the fix, please let me know. |
|
Now repro with Use TORCH_LOGS="+dynamo" to check the full log, here 's some important log Selected logFindingexplain() path: This might be related to https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0036.html cc @Lucaskabela What do you suggest for the workaround, can I directly handling |
|
Why aren't we just making the repro: def f1(a):
num = a.nonzero().squeeze(-1).numel()
return torch.tensor([num]) # works
def f2(a):
num = a.nonzero().squeeze(-1).numel()
return a.new_tensor([num]) # fails before this change
a = torch.tensor([1, 0])
torch.compile(f1, fullgraph=True)(a)
torch.compile(f2, fullgraph=True)(a)into a unit test case? This should be done and added to our unit test (i.e checked into the code) first as a prerequisite before we go any further. Please make this a formal pytorch test case and add it to one of our unit test files |
torch/_dynamo/variables/tensor.py
Outdated
| # are common keys. | ||
| all_tensor_attrs = torch._C.TensorBase.__dict__ | torch.Tensor.__dict__ | ||
|
|
||
| def _contains_unspec_tensor_data(x): |
There was a problem hiding this comment.
nit: Needs typehints like:
_contains_unspec_tensor_data(x: VariableTracker) -> bool:
torch/_dynamo/variables/tensor.py
Outdated
| return self.call_method(tx, "new_empty", args, kwargs) | ||
| return None | ||
|
|
||
| def method_new_tensor(self, *args, **kwargs): |
There was a problem hiding this comment.
nit: needs typehints (see above for example)
torch/_dynamo/variables/tensor.py
Outdated
|
|
||
| from ..symbolic_convert import InstructionTranslator | ||
|
|
||
| tx = InstructionTranslator.current_tx() |
There was a problem hiding this comment.
Why aren't we just passing tx in as an arg?
|
Understood. I’ll first turn this into a formal PyTorch unit test case, add it to the appropriate test file, and make sure it is checked in before proceeding further. |
|
Added a unitest |
| res = opt_f(x) | ||
| self.assertEqual(ref, res) | ||
|
|
||
| @torch._dynamo.config.patch( |
There was a problem hiding this comment.
We should also add a multi tensor test case, like:
return a.new_tensor([n, n + 1, n * 2]) I believe this particular logic will fail for that example
torch/_dynamo/variables/tensor.py
Outdated
| all_tensor_attrs = torch._C.TensorBase.__dict__ | torch.Tensor.__dict__ | ||
|
|
||
|
|
||
| def _contains_unspec_tensor_data(x: VariableTracker) -> bool: |
There was a problem hiding this comment.
Since this is only used in method_new_tensor let's just check inside method_new_tensor (no need for this helper function up here so far from the logic it is used
torch/_dynamo/variables/tensor.py
Outdated
| new_kwargs["device"] = kwargs.get("device", self.var_getattr(tx, "device")) | ||
|
|
||
| # Build scalar_tensor | ||
| scalar = variables.TorchInGraphFunctionVariable( |
There was a problem hiding this comment.
This below logic needs refactoring to support the more general case - we don't really need unsqueeze (can use stack instead)
Lucaskabela
left a comment
There was a problem hiding this comment.
Thanks for iterating on this - I think this looks ready to me!
|
@pytorchmergebot merge |
|
Thank you!🙏 |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (openreg, 1, 1, macos-m1-stable) Details for Dev Infra teamRaised by workflow job |
|
@pytorchmergebot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (openreg, 1, 1, macos-m1-stable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m1-14) Details for Dev Infra teamRaised by workflow job |
|
@claude explain this test failure Build left local git repository checkout dirty |
|
Claude finished @Lucaskabela's task in 1m 41s —— View job Analysis of "Build left local git repository checkout dirty" failure
This failure is unrelated to the PR changes. Both failing jobs show the same issue: What happenedAfter all tests complete, PyTorch CI runs an This is a known macOS CI infrastructure issue — the The actual tests themselves passed successfully (the logs show ResolutionThis is already being handled correctly: @Lucaskabela used Since the |
|
Is there anything that I can help? |
|
@pytorchmergebot merge -i |
I will ping if so - pretty sure these are just flaky test signals though |
Merge startedYour change will be merged while ignoring the following 6 checks: trunk / macos-py3-arm64 / test (openreg, 1, 1, macos-m1-stable), trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m1-14), trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable), trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable), trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m2-15), trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…6390) ## PR Summary Fixes pytorch#176067 a `torch.compile(..., fullgraph=True)` failure for `Tensor.new_tensor(...)` ## Repro ```python import torch def f1(a): num = a.nonzero().squeeze(-1).numel() return torch.tensor([num]) # works def f2(a): num = a.nonzero().squeeze(-1).numel() return a.new_tensor([num]) # fails before this change a = torch.tensor([1, 0]) torch.compile(f1, fullgraph=True)(a) torch.compile(f2, fullgraph=True)(a) ``` Prior to this change, `f2` could fail with: ```text torch._dynamo.exc.UserError: Could not extract specialized integer from data-dependent expression u0 (unhinted: u0). (Size-like symbols: u0) ``` related issue: pytorch#176067 Pull Request resolved: pytorch#176390 Approved by: https://github.com/Lucaskabela
PR Summary
Fixes #176067 a
torch.compile(..., fullgraph=True)failure forTensor.new_tensor(...)Repro
Prior to this change,
f2could fail with:related issue: #176067
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo @mlazos