functionalization: skip meta reference compute for aot autograd#87108
functionalization: skip meta reference compute for aot autograd#87108bdhirsh wants to merge 4 commits intogh/bdhirsh/328/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87108
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit f24231a: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ograd" [ghstack-poisoned]
|
@ezyang I'm seeing some interesting UBSAN errors, that seem to have been uncovered around The error is: Some googling around led me here, and I was wondering if something funky is going on around our subclassing of |
|
The bebebe pattern in the address suggests that you're reading out of some garbage memory that we memset to 0xBE but I don't actually see anywhere in our codebase where we use the 0xBE bit pattern. Are you sure this isn't functionalization related? I feel that the aotdispatch tests will be calling functionalization? You should put your heads together with @anjali411, @albanD were investigating another memory error and I wonder if this is the same root cause. |
|
I don't think the ASAN failure is actually related to this change (original hypothesis from Horace's previous PR was that this PR might fix the skipped test), so I'm going to leave the test skipped and land this PR in the interest of fixing a few dynamic shapes models |
…ograd" The context is that historically, XLA/LTC tensors haven't had accurate stride information, and functionalization would run "reference" meta kernels for view ops on the side to properly compute strides. This is more complicated in symint tracing world - we have a `FunctionalTensorWrapper()` that wraps the underlying tensor and has its own set of sizes/strides metadata, but we never create proxy objects for the sizes/strides of the wrapper. In symint tracing world with aot autograd, we're guaranteed that our underlying strides are accurate anyway, since aot autograd uses fake tensors to perform tracing. We encountered a few bugs with symint's from the `FunctionalTensorWrapper` making their way into `__torch_dispatch__`. To side-step that area of bugs completely (and marginally improve perf), this PR disables the meta tensor tracing for non XLA/LTC use cases. [ghstack-poisoned]
Oh the segfault shows up in |
…ograd" The context is that historically, XLA/LTC tensors haven't had accurate stride information, and functionalization would run "reference" meta kernels for view ops on the side to properly compute strides. This is more complicated in symint tracing world - we have a `FunctionalTensorWrapper()` that wraps the underlying tensor and has its own set of sizes/strides metadata, but we never create proxy objects for the sizes/strides of the wrapper. In symint tracing world with aot autograd, we're guaranteed that our underlying strides are accurate anyway, since aot autograd uses fake tensors to perform tracing. We encountered a few bugs with symint's from the `FunctionalTensorWrapper` making their way into `__torch_dispatch__`. To side-step that area of bugs completely (and marginally improve perf), this PR disables the meta tensor tracing for non XLA/LTC use cases. [ghstack-poisoned]
|
@pytorchbot merge -f "unrelated failure" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Hey @bdhirsh. |
The context is that historically, XLA/LTC tensors haven't had accurate stride information, and functionalization would run "reference" meta kernels for view ops on the side to properly compute strides.
This is more complicated in symint tracing world - we have a
FunctionalTensorWrapper()that wraps the underlying tensor and has its own set of sizes/strides metadata, but we never create proxy objects for the sizes/strides of the wrapper.In symint tracing world with aot autograd, we're guaranteed that our underlying strides are accurate anyway, since aot autograd uses fake tensors to perform tracing. We encountered a few bugs with symint's from the
FunctionalTensorWrappermaking their way into__torch_dispatch__. To side-step that area of bugs completely (and marginally improve perf), this PR disables the meta tensor tracing for non XLA/LTC use cases.Stack from ghstack (oldest at bottom):