Refactor TensorAccessor for headeronly.#166855
Refactor TensorAccessor for headeronly.#166855pearu wants to merge 33 commits intogh/pearu/150/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166855
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 966eea7 with merge base d01a7b0 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
janeyx99
left a comment
There was a problem hiding this comment.
Thank you! This looks p good to me, but let's rename the headeronly APIs so they don't share names to minimize confusion.
| /// at::TensorAccessor, for instance) when `final` specifier is used. | ||
| template <typename T> | ||
| class ArrayRef final : public HeaderOnlyArrayRef<T> { | ||
| class ArrayRef : public HeaderOnlyArrayRef<T> { |
There was a problem hiding this comment.
The reason is given in the note before the class definition:
/// NOTE: ArrayRef cannot be derived from. Normally, we would use
/// `final` specifier to force this constraint at compile time.
/// However, Intel compiler does not recognize ArrayRef as a class
/// template (which is required in the definition of
/// at::TensorAccessor, for instance) when `final` specifier is
/// used. So, we cannot define ArrayRef as final because of the Intel
/// compiler issue.When ArrayRef is defined as final, the xpu CI fails with
2025-11-12T12:58:10.7402385Z [7169/8004] Building SYCL (Device) object torch_xpu_ops_gen_SparseSoftmaxKernels.cpp.o�[K
2025-11-12T12:58:10.7405002Z �[31mFAILED: �[0mcaffe2/aten_xpu/src/CMakeFiles/torch_xpu_ops.dir/ATen/native/sparse/xpu/sycl/torch_xpu_ops_gen_SparseSoftmaxKernels.cpp.o /var/lib/jenkins/workspace/build/caffe2/aten_xpu/src/CMakeFiles/torch_xpu_ops.dir/ATen/native/sparse/xpu/sycl/torch_xpu_ops_gen_SparseSoftmaxKernels.cpp.o
2025-11-12T12:58:10.7409903Z cd /var/lib/jenkins/workspace/build/caffe2/aten_xpu/src/CMakeFiles/torch_xpu_ops.dir/ATen/native/sparse/xpu/sycl && /opt/conda/envs/py_3.10/lib/python3.10/site-packages/cmake/data/bin/cmake -E make_directory /var/lib/jenkins/workspace/build/caffe2/aten_xpu/src/CMakeFiles/torch_xpu_ops.dir/ATen/native/sparse/xpu/sycl/. && /opt/conda/envs/py_3.10/lib/python3.10/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D generated_file:STRING=/var/lib/jenkins/workspace/build/caffe2/aten_xpu/src/CMakeFiles/torch_xpu_ops.dir/ATen/native/sparse/xpu/sycl/./torch_xpu_ops_gen_SparseSoftmaxKernels.cpp.o -P /var/lib/jenkins/workspace/build/caffe2/aten_xpu/src/CMakeFiles/torch_xpu_ops.dir/ATen/native/sparse/xpu/sycl/torch_xpu_ops_gen_SparseSoftmaxKernels.cpp.o.Release.cmake
2025-11-12T12:58:10.7413388Z In file included from <command-line>:
2025-11-12T12:58:10.7414602Z /tmp/icx-8814ad46d2/SparseSoftmaxKernels-header-221199.h:1621:248: error: ‘c10::ArrayRef’ is not a template
2025-11-12T12:58:10.7416885Z 1621 | template <> struct KernelInfo<::sycl::detail::RoundedRangeKernel<::sycl::item<1, true>, 1, ::at::native::xpu::MaxRowKernelFunctor<double, long *, ::torch::headeronly::detail::GenericPackedTensorAccessor<::torch::headeronly::detail::TensorAccessor<::c10::ArrayRef<long>, double, 1, torch::headeronly::DefaultPtrTraits, long>, ::at::IndexBoundsCheck<2, long>, double, 2, torch::headeronly::DefaultPtrTraits, long>, double *>>> {
2025-11-12T12:58:10.7418863Z | ^~
There was a problem hiding this comment.
cc @swolchok does the final specifier matter to keep around?
There was a problem hiding this comment.
sounds like a compiler bug, but no I dont think that final is gating optimization opportunities in the absence of virtual methods
janeyx99
left a comment
There was a problem hiding this comment.
Can you also add a kernel use case in libtorch_agnostic_extension in kernel.cpp that uses the HeaderOnlyPackedTensorAccessor in a more real-world way?
Yes, I have added mv_tensor_accessor with CPU and CUDA kernels using headeronly tensor accessors. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
ghstack-source-id: a901b1d Pull Request resolved: pytorch/pytorch#166855
This PR moves the implementations of Tensor accessor classes to headeronly with the following modifications:
- Add ArrayRef and IndexBoundsCheck template parameters to refactor out the usages of `IntArrayRef` and `TORCH_CHECK_INDEX` from Tensor accessor implementations.
- Eliminate usage of `c10::irange` as it is not headeronly-compatible.
- Introduce `torch::headeronly::{TensorAccessorBase,TensorAccessor, GenericPackedTensorAccessorBase, GenericPackedTensorAccessor}` that are headeronly-equivalent to `at::{TensorAccessorBase,TensorAccessor, GenericPackedTensorAccessorBase, GenericPackedTensorAccessor}`. Both these sets of template classes use original implementations from `torch::headeronly::detail` that have new template parameters `ArrayRefCls` and `IndexBoundsCheck` to facilitate `at` and `torch::headeronly` implementations of ArrayRef and checking indices.
TODO:
- ~when pytorch#164991 lands, eliminate the placeholder class HeaderOnlyArrayRef~ UPDATE: done.
Pull Request resolved: pytorch#166855
Approved by: https://github.com/janeyx99
This PR moves the implementations of Tensor accessor classes to headeronly with the following modifications:
IntArrayRefandTORCH_CHECK_INDEXfrom Tensor accessor implementations.c10::irangeas it is not headeronly-compatible.torch::headeronly::{TensorAccessorBase,TensorAccessor, GenericPackedTensorAccessorBase, GenericPackedTensorAccessor}that are headeronly-equivalent toat::{TensorAccessorBase,TensorAccessor, GenericPackedTensorAccessorBase, GenericPackedTensorAccessor}. Both these sets of template classes use original implementations fromtorch::headeronly::detailthat have new template parametersArrayRefClsandIndexBoundsCheckto facilitateatandtorch::headeronlyimplementations of ArrayRef and checking indices.TODO:
when Refactor out headeronly ArrayRef #164991 lands, eliminate the placeholder class HeaderOnlyArrayRefUPDATE: done.Stack from ghstack (oldest at bottom):
cc @jbschlosser