[7/n] Migrate pos_encoding and norm kernels to libtorch stable ABI#38783
[7/n] Migrate pos_encoding and norm kernels to libtorch stable ABI#38783mikaylagawarecki wants to merge 31 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new _C_stable_libtorch extension to support a stable ABI, enabling better compatibility across different PyTorch versions and environments. It refactors several core kernels and quantization operations to use this stable ABI, including layernorm, positional encoding, and various quantization kernels. Additionally, it enables this stable extension for both CUDA and HIP backends. I have identified a potential compilation issue where the hadacore_transform declaration is placed outside the appropriate conditional compilation block, which may cause build failures on non-CUDA backends.
| torch::stable::Tensor hadacore_transform(torch::stable::Tensor& x, | ||
| bool inplace); |
There was a problem hiding this comment.
The hadacore_transform function is compiled only for CUDA, but its declaration is outside the #ifdef VLLM_CUDA block. This will lead to compilation errors when building for other backends like ROCm/HIP. This declaration should be moved inside the #ifdef VLLM_CUDA block, before the #endif on line 156.
There was a problem hiding this comment.
was pre-existing before this stack see
Line 296 in 08ed2b9
30e40eb to
59af75d
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pure move, no code changes. Preparatory step for stable ABI migration. Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pure move, no code changes. Preparatory step for stable ABI migration. Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Restructure the stable ABI extension build so it compiles on both CUDA and HIP: - Widen outer guard to include HIP - Move CUDA-only sources (CUTLASS, FP4, AWQ, permute_cols) into a CUDA-conditional block - Gate USE_CUDA / CUTLASS_ENABLE_DIRECT_CUDA_DRIVER_CALL to CUDA; define USE_ROCM for HIP - Link PyTorch's bundled libamdhip64.so on ROCm to avoid a dual HIP runtime (from 985769a) - Enable _C_stable_libtorch in setup.py for HIP builds Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Move 9 basic activation ops (silu_and_mul, mul_and_silu, gelu_and_mul, gelu_tanh_and_mul, fatrelu_and_mul, swigluoai_and_mul, gelu_new, gelu_fast, gelu_quick) from the _C extension to _C_stable_libtorch. Convert ATen types/APIs to stable ABI equivalents: - torch::Tensor -> torch::stable::Tensor - ATen device guard/stream -> stable accelerator APIs - VLLM_DISPATCH_FLOATING_TYPES -> VLLM_STABLE_DISPATCH_FLOATING_TYPES - data_ptr -> mutable_data_ptr Quantized activation ops (silu_and_mul_quant, persistent_masked_m_silu_mul_quant) remain in _C. Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
59af75d to
4b5c459
Compare
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
…ch stable ABI Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
…ed_add_rms_norm_static_fp8_quant) to torch stable ABI Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
…libtorch_stable Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
11661d8 to
8754a42
Compare
| @@ -91,12 +91,6 @@ void rms_norm(torch::Tensor& out, torch::Tensor& input, torch::Tensor& weight, | |||
| void fused_add_rms_norm(torch::Tensor& input, torch::Tensor& residual, | |||
There was a problem hiding this comment.
yep
vllm/csrc/cpu/torch_bindings.cpp
Lines 189 to 192 in 188defb
| #include <torch/headeronly/util/Half.h> | ||
|
|
||
| #ifndef USE_ROCM | ||
| #include <cuda.h> |
There was a problem hiding this comment.
I think before torch/all.h or some other torch include pulls this in, but now we need to explicitly include this for CUDA_VERSION used below on line 50
| "csrc/libtorch_stable/fused_qknorm_rope_kernel.cu" | ||
| "csrc/libtorch_stable/layernorm_kernels.cu" | ||
| "csrc/libtorch_stable/layernorm_quant_kernels.cu" | ||
| "csrc/libtorch_stable/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu") |
There was a problem hiding this comment.
so cleannnn cleanest cmake change so far!
|
This pull request has merge conflicts that must be resolved before it can be |
Gaps identified by comparing tool output against a real 1,858-line manual migration PR (vllm-project/vllm#38783): Rules.h: - Add torch::k* scalar type shorthands (kFloat, kBFloat16, kInt8, kInt32, etc.) - Add c10::/at:: scalar type rewrites (Half, BFloat16, Float8_e4m3fn, etc.) - Add CUDA check macro rules (C10_CUDA_CHECK, AT_CUDA_CHECK, C10_CUDA_KERNEL_LAUNCH_CHECK) - Add TORCH_CHECK_NOT_IMPLEMENTED → STD_TORCH_CHECK_NOT_IMPLEMENTED - Add more method-to-free-function rules (sum, pad, new_zeros, permute, slice, index_select, repeat, expand) AstCallbacks.cpp: - Register new type names and scalar type shorthands in AST matchers - Register new method names for method-to-free-function conversion Verifier.cpp: - Detect torch::k* shorthands as unstable - Detect C10_CUDA_CHECK, AT_CUDA_CHECK, C10_CUDA_KERNEL_LAUNCH_CHECK - Detect TORCH_CHECK_NOT_IMPLEMENTED - Detect .dtype() usage (unstable caffe2::TypeMeta, use .scalar_type()) - Detect torch::TensorOptions (needs decomposition into explicit args) - Detect at::Half, c10::Half, c10::BFloat16, c10::Float8_* types - Detect at::elementSize (use tensor.element_size())
|
This pull request has merge conflicts that must be resolved before it can be |
|
Superseded by newer PRs. |
Purpose
Stacked on #38757, commits to review https://github.com/vllm-project/vllm/pull/38783/changes/deea6618c38afb4735b442c61e2697c273654292..8754a4250584115db08113e0889313c939d85eb6
Note: some declarations are not deleted from csrc/ops.h despite being moved to csrc/libtorch_stable/ops.h. This is because the CPU build also uses these declarations. These are
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.