Test that TORCH_FEATURE_VERSION guards are used where needed#167962
Test that TORCH_FEATURE_VERSION guards are used where needed#167962mikaylagawarecki wants to merge 4 commits intogh/mikaylagawarecki/383/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167962
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit f11e191 with merge base 1c04a43 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
| @@ -0,0 +1,40 @@ | |||
| // This is duplicated from the libtorch_agnostic_2_9_extension | |||
There was a problem hiding this comment.
mv_tensor_accessor_cpu was added in 2_10, no?
There was a problem hiding this comment.
yea hmmm should this code be in 2_10 from the start? Or no, because it's header only?
This is a bit of a confusing counterexample because it is headeronly so it retroactively works, but a more clear example might be something that was definitely landed in 2_9?
There was a problem hiding this comment.
It works in 2.9 because it only uses things in headeronly
There's nothing being passed across the shim boundary and stable::Tensor.sizes() and stable::Tensor.strides() also works in 2.9 is my understanding
My workflow in the below PR ensures that this test is able to run on 2.9
There was a problem hiding this comment.
This is a bit of a confusing counterexample because it is headeronly so it retroactively works, but a more clear example might be something that was definitely landed in 2_9?
I agree that it is confusing, but I disagree that it's not a good sanity check test, this function demonstrates the exact case that we want this test class to catch
- Current version is 2.x, user adds test and adds it to
libtorch_agnostic_2_x_extensionthinking this is a 2.x feature - We want this test to catch this (due to compilation succeeding) and prompt the user to either add version guards or move this test to
libtorch_agnostic_2_(x-1)_extension - If the
libtorch_agnostic_targettingworkflow succeeds, we've confirmed the test belongs inlibtorch_agnostic_2_(x-1)_extension, otherwise we have signal for the user that they missed version guards
wdyt, does that motivate the rationale for choosing this kernel better? Also happy to add others in a followup if you disagree
| @@ -0,0 +1,47 @@ | |||
| // This is duplicated from the libtorch_agnostic_2_9_extension | |||
| using torch::stable::Tensor; | ||
|
|
||
| // Declare my__foreach_mul (defined in my__foreach_mul.cpp) | ||
| extern std::vector<Tensor> my__foreach_mul( |
There was a problem hiding this comment.
Is this code resilient to linking order? e.g., if this code is linked before my__foreach_mul.cpp, will it break?
There was a problem hiding this comment.
Hm good question, I didn't run into any issues with this so far, according to claude it should not matter :)
**setup.py**compiles all.cppfiles in thecsrcdirectory as separate object files- The files are linked together into a single shared library (
_C.so) - The linker resolves the symbol
**my__foreach_mul**regardless of which object file comes first in the link order
| @@ -0,0 +1,40 @@ | |||
| // This is duplicated from the libtorch_agnostic_2_9_extension | |||
There was a problem hiding this comment.
yea hmmm should this code be in 2_10 from the start? Or no, because it's header only?
This is a bit of a confusing counterexample because it is headeronly so it retroactively works, but a more clear example might be something that was definitely landed in 2_9?
test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py
Outdated
Show resolved
Hide resolved
| relevant_errors = self._extract_relevant_errors(error_msg) | ||
| if relevant_errors: | ||
| print("\n Unexpected compilation errors for mv_tensor_accessor_cpu:") | ||
| for err in relevant_errors[:10]: |
There was a problem hiding this comment.
yea removed all these
| success, | ||
| f"mv_tensor_accessor_cpu.cpp failed to compile with TORCH_TARGET_VERSION=2.9.0. " | ||
| f"This file is expected to work with 2.9.0 since it doesn't use 2.10+ features. " | ||
| f"Error: {error_msg[:500]}", |
There was a problem hiding this comment.
and 500 here too--what's the normal max length you'd expect?
| return relevant_errors | ||
|
|
||
|
|
||
| # Dynamically create test methods for each .cpp and .cu file |
There was a problem hiding this comment.
Can we still run an individual test with -k for dynamically created test methods? (I'm guessing yes)
There was a problem hiding this comment.
Yep, I was able to run
python test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py -k test_my_view_requires_2_10
test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py
Outdated
Show resolved
Hide resolved
ghstack-source-id: b5c8ce0 Pull Request resolved: pytorch/pytorch#167962
Splits each torch library registration in the 2.10 folder into its own file -- I had a script that parsed kernel.cpp to do this but I felt like forcing this responsibility on the user might be less error prone Compiles each file targetting 2.9 and asserts that compilation fails. (There are 2 2.9 kernels we use as negative tests where compilation is expected to succeed) [ghstack-poisoned]
This is tested by #167962 which ensures we get compilation errors when using functions that convert Device/HeaderOnlyArrayRef to StableIValue and target 2.9 [ghstack-poisoned]
This is tested by #167962 which ensures we get compilation errors when using functions that convert Device/HeaderOnlyArrayRef to StableIValue and target 2.9 [ghstack-poisoned]
Splits each torch library registration in the 2.10 folder into its own file -- I had a script that parsed kernel.cpp to do this but I felt like forcing this responsibility on the user might be less error prone Compiles each file targetting 2.9 and asserts that compilation fails. (There are 2 2.9 kernels we use as negative tests where compilation is expected to succeed) [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This is tested by #167962 which ensures we get compilation errors when using functions that convert Device/HeaderOnlyArrayRef to StableIValue and target 2.9 Pull Request resolved: #167802 Approved by: https://github.com/janeyx99 ghstack dependencies: #168025
|
This broke ROCm, but forward fix incoming. |
Unclear which PR in the ghstack caused the ROCm failure. Stack was (oldest at bottom): - pytorch#167962 - pytorch#167804 - pytorch#167803 - pytorch#167802 - pytorch#168025
Unclear which PR in the ghstack caused the ROCm failure. Stack was (oldest at bottom): - #167962 - #167804 - #167803 - #167802 - #168025 Fixes the following test: PYTORCH_TEST_WITH_ROCM=1 python test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py FunctionVersionCompatibilityTest.test_mv_tensor_accessor_cuda_works_with_2_9 Pull Request resolved: #168087 Approved by: https://github.com/jeffdaily, https://github.com/janeyx99 Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
~This PR does change the semantics of the >> operator by using STD_TORCH_CHECK to throw the error instead of TORCH_CHECK. Jane (who is writing this message) thinks it is okay because it is the error case when an invalid MemoryFormat or Layout is getting passed into >>, so the UX benefits of TORCH_CHECK over STD_TORCH_CHECK there are not significant enough to warrant making a new copy of Layout and MemoryFormat's >> APIs.~ Never mind! We shouldn't change TORCH_CHECK to STD_TORCH_CHECK for core usage ever, cuz the traceback info and c10::Error is very much desired!! So the solution is to not migrate the >>s. I pushed new commits to the stack to remove the >> code, but for reference, 8a30179 has all the code that I ended up deleting. Pull Request resolved: #168034 Approved by: https://github.com/janeyx99 ghstack dependencies: #168025, #167802, #167803, #167804, #167962 Co-authored-by: Jane Xu <janeyx@meta.com>
~This PR does change the semantics of the >> operator by using STD_TORCH_CHECK to throw the error instead of TORCH_CHECK. Jane (who is writing this message) thinks it is okay because it is the error case when an invalid MemoryFormat or Layout is getting passed into >>, so the UX benefits of TORCH_CHECK over STD_TORCH_CHECK there are not significant enough to warrant making a new copy of Layout and MemoryFormat's >> APIs.~ Never mind! We shouldn't change TORCH_CHECK to STD_TORCH_CHECK for core usage ever, cuz the traceback info and c10::Error is very much desired!! So the solution is to not migrate the >>s. I pushed new commits to the stack to remove the >> code, but for reference, 8a30179 has all the code that I ended up deleting. Pull Request resolved: #168034 Approved by: https://github.com/janeyx99 ghstack dependencies: #168025, #167802, #167803, #167804, #167962 Co-authored-by: Jane Xu <janeyx@meta.com>
ghstack-source-id: 311fb1b Pull Request resolved: pytorch/pytorch#167962
Splits each torch library registration in the 2.10 folder into its own file -- I had a script that parsed kernel.cpp to do this but I felt like forcing this responsibility on the user might be less error prone
Compiles each file targetting 2.9 and asserts that compilation fails. (There are 2 2.9 kernels we use as negative tests where compilation is expected to succeed)
Stack from ghstack (oldest at bottom):