Skip to content

Test that TORCH_FEATURE_VERSION guards are used where needed#167962

Closed
mikaylagawarecki wants to merge 4 commits intogh/mikaylagawarecki/383/basefrom
gh/mikaylagawarecki/383/head
Closed

Test that TORCH_FEATURE_VERSION guards are used where needed#167962
mikaylagawarecki wants to merge 4 commits intogh/mikaylagawarecki/383/basefrom
gh/mikaylagawarecki/383/head

Conversation

@mikaylagawarecki
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki commented Nov 17, 2025

Splits each torch library registration in the 2.10 folder into its own file -- I had a script that parsed kernel.cpp to do this but I felt like forcing this responsibility on the user might be less error prone

Compiles each file targetting 2.9 and asserts that compilation fails. (There are 2 2.9 kernels we use as negative tests where compilation is expected to succeed)

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167962

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit f11e191 with merge base 1c04a43 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@@ -0,0 +1,40 @@
// This is duplicated from the libtorch_agnostic_2_9_extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mv_tensor_accessor_cpu was added in 2_10, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea hmmm should this code be in 2_10 from the start? Or no, because it's header only?

This is a bit of a confusing counterexample because it is headeronly so it retroactively works, but a more clear example might be something that was definitely landed in 2_9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works in 2.9 because it only uses things in headeronly

There's nothing being passed across the shim boundary and stable::Tensor.sizes() and stable::Tensor.strides() also works in 2.9 is my understanding

My workflow in the below PR ensures that this test is able to run on 2.9

Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janeyx99

This is a bit of a confusing counterexample because it is headeronly so it retroactively works, but a more clear example might be something that was definitely landed in 2_9?

I agree that it is confusing, but I disagree that it's not a good sanity check test, this function demonstrates the exact case that we want this test class to catch

  • Current version is 2.x, user adds test and adds it to libtorch_agnostic_2_x_extension thinking this is a 2.x feature
  • We want this test to catch this (due to compilation succeeding) and prompt the user to either add version guards or move this test to libtorch_agnostic_2_(x-1)_extension
  • If the libtorch_agnostic_targetting workflow succeeds, we've confirmed the test belongs in libtorch_agnostic_2_(x-1)_extension, otherwise we have signal for the user that they missed version guards

wdyt, does that motivate the rationale for choosing this kernel better? Also happy to add others in a followup if you disagree

@@ -0,0 +1,47 @@
// This is duplicated from the libtorch_agnostic_2_9_extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

using torch::stable::Tensor;

// Declare my__foreach_mul (defined in my__foreach_mul.cpp)
extern std::vector<Tensor> my__foreach_mul(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code resilient to linking order? e.g., if this code is linked before my__foreach_mul.cpp, will it break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm good question, I didn't run into any issues with this so far, according to claude it should not matter :)

  • **setup.py** compiles all .cpp files in the csrc directory as separate object files
  • The files are linked together into a single shared library (_C.so)
  • The linker resolves the symbol **my__foreach_mul**regardless of which object file comes first in the link order

@@ -0,0 +1,40 @@
// This is duplicated from the libtorch_agnostic_2_9_extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea hmmm should this code be in 2_10 from the start? Or no, because it's header only?

This is a bit of a confusing counterexample because it is headeronly so it retroactively works, but a more clear example might be something that was definitely landed in 2_9?

relevant_errors = self._extract_relevant_errors(error_msg)
if relevant_errors:
print("\n Unexpected compilation errors for mv_tensor_accessor_cpu:")
for err in relevant_errors[:10]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the 10 arbitrary haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea removed all these

success,
f"mv_tensor_accessor_cpu.cpp failed to compile with TORCH_TARGET_VERSION=2.9.0. "
f"This file is expected to work with 2.9.0 since it doesn't use 2.10+ features. "
f"Error: {error_msg[:500]}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and 500 here too--what's the normal max length you'd expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

return relevant_errors


# Dynamically create test methods for each .cpp and .cu file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still run an individual test with -k for dynamically created test methods? (I'm guessing yes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I was able to run

python test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py  -k test_my_view_requires_2_10

@mikaylagawarecki mikaylagawarecki added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 17, 2025
Khanaksahu pushed a commit to Khanaksahu/pytorch-fork that referenced this pull request Nov 17, 2025
Splits each torch library registration in the 2.10 folder into its own file -- I had a script that parsed kernel.cpp to do this but I felt like forcing this responsibility on the user might be less error prone

Compiles each file targetting 2.9 and asserts that compilation fails. (There are 2 2.9 kernels we use as negative tests where compilation is expected to succeed)




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Nov 18, 2025
This is tested by #167962 which ensures we get compilation errors when using functions that convert Device/HeaderOnlyArrayRef to StableIValue and target 2.9




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Nov 18, 2025
This is tested by #167962 which ensures we get compilation errors when using functions that convert Device/HeaderOnlyArrayRef to StableIValue and target 2.9




[ghstack-poisoned]
Splits each torch library registration in the 2.10 folder into its own file -- I had a script that parsed kernel.cpp to do this but I felt like forcing this responsibility on the user might be less error prone

Compiles each file targetting 2.9 and asserts that compilation fails. (There are 2 2.9 kernels we use as negative tests where compilation is expected to succeed)




[ghstack-poisoned]
@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Nov 18, 2025
This is tested by #167962 which ensures we get compilation errors when using functions that convert Device/HeaderOnlyArrayRef to StableIValue and target 2.9

Pull Request resolved: #167802
Approved by: https://github.com/janeyx99
ghstack dependencies: #168025
@jeffdaily
Copy link
Collaborator

This broke ROCm, but forward fix incoming.

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Nov 18, 2025
Unclear which PR in the ghstack caused the ROCm failure.
Stack was (oldest at bottom):
 - pytorch#167962
 - pytorch#167804
 - pytorch#167803
 - pytorch#167802
 - pytorch#168025
pytorchmergebot pushed a commit that referenced this pull request Nov 19, 2025
Unclear which PR in the ghstack caused the ROCm failure. Stack was (oldest at bottom):
 - #167962
 - #167804
 - #167803
 - #167802
 - #168025

Fixes the following test:

PYTORCH_TEST_WITH_ROCM=1 python test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py FunctionVersionCompatibilityTest.test_mv_tensor_accessor_cuda_works_with_2_9

Pull Request resolved: #168087
Approved by: https://github.com/jeffdaily, https://github.com/janeyx99

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
pytorchmergebot pushed a commit that referenced this pull request Nov 20, 2025
~This PR does change the semantics of the >> operator by using STD_TORCH_CHECK to throw the error instead of TORCH_CHECK. Jane (who is writing this message) thinks it is okay because it is the error case when an invalid MemoryFormat or Layout is getting passed into >>, so the UX benefits of TORCH_CHECK over STD_TORCH_CHECK there are not significant enough to warrant making a new copy of Layout and MemoryFormat's >> APIs.~

Never mind! We shouldn't change TORCH_CHECK to STD_TORCH_CHECK for core usage ever, cuz the traceback info and c10::Error is very much desired!! So the solution is to not migrate the >>s. I pushed new commits to the stack to remove the >> code, but for reference, 8a30179 has all the code that I ended up deleting.

Pull Request resolved: #168034
Approved by: https://github.com/janeyx99
ghstack dependencies: #168025, #167802, #167803, #167804, #167962

Co-authored-by: Jane Xu <janeyx@meta.com>
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
~This PR does change the semantics of the >> operator by using STD_TORCH_CHECK to throw the error instead of TORCH_CHECK. Jane (who is writing this message) thinks it is okay because it is the error case when an invalid MemoryFormat or Layout is getting passed into >>, so the UX benefits of TORCH_CHECK over STD_TORCH_CHECK there are not significant enough to warrant making a new copy of Layout and MemoryFormat's >> APIs.~

Never mind! We shouldn't change TORCH_CHECK to STD_TORCH_CHECK for core usage ever, cuz the traceback info and c10::Error is very much desired!! So the solution is to not migrate the >>s. I pushed new commits to the stack to remove the >> code, but for reference, 8a30179 has all the code that I ended up deleting.

Pull Request resolved: #168034
Approved by: https://github.com/janeyx99
ghstack dependencies: #168025, #167802, #167803, #167804, #167962

Co-authored-by: Jane Xu <janeyx@meta.com>
tiendatngcs pushed a commit to tiendatngcs/pytorch-Dec25 that referenced this pull request Dec 10, 2025
@github-actions github-actions bot deleted the gh/mikaylagawarecki/383/head branch December 19, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants