auto">Follow up of #1747, here is what we plan to do to make torchao ABI compatible and closer to python only, after this is done, torchao will be compatible with all pytorch versions and we don't need to worry about #2919

please feel free to pick up the tasks by adding your name to Status column

Status	Assignee	File	Description	Plan
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/fp6_linear.cu	FP6 linear layer	delete
Deleted #3612	@howardzhang-cv	torchao/csrc/cuda/marlin_qqq/marlin_qqq_kernel.cu	Marlin QQQ quantization	delete
Deleted #3744	@howardzhang-cv	torchao/csrc/cuda/activation24/sparse_gemm.cu	2:4 sparse GEMM	delete
Deleted #3744	@howardzhang-cv	torchao/csrc/cuda/activation24/sparsify24.cu	2:4 sparsification	delete
Deleted #3613	@howardzhang-cv	torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu	N:M sparse Marlin	delete
Deleted #3722	@howardzhang-cv	torchao/csrc/cuda/tensor_core_tiled_layout/tensor_core_tiled_layout.cu	Tensor core tiled layout	delete
ABI stable #3610	@andrewor14 @danielvegamyhre	torchao/csrc/cuda/mx_kernels/mxfp8_cuda.cu	MXFP8 CUDA kernels	Make ABI compatible
ABI stable #3610	@andrewor14 @danielvegamyhre	torchao/csrc/cuda/mx_kernels/mxfp8_extension.cpp	MXFP8 CUDA kernels	Make ABI compatible
No need to change	@andrewor14 @danielvegamyhre	torchao/csrc/cuda/mx_kernels/mx_block_rearrange_2d_M_groups.cu	MXFP8 CUDA kernels	Make ABI compatible
Deleted #3723	@jerryzh168	torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass_s4s4.cu	S4S4 row-wise scaled linear	delete
Deleted #3723	@jerryzh168	torchao/csrc/cuda/rowwise_scaled_linear_cutlass/rowwise_scaled_linear_cutlass_s8s4.cu	S8S4 row-wise scaled linear	delete
ABI stable #3725	@andrewor14	torchao/csrc/cuda/rowwise_scaled_linear_sparse_cutlass/rowwise_scaled_linear_sparse_cutlass_e4m3e4m3.cu	Sparse E4M3xE4M3	Make ABI compatible, not build by default
ABI stable #3725	@andrewor14	torchao/csrc/cuda/rowwise_scaled_linear_sparse_cutlass/rowwise_scaled_linear_sparse_cutlass_e4m3e5m2.cu	Sparse E4M3xE5M2	Make ABI compatible, not build by default
ABI stable #3725	@andrewor14	torchao/csrc/cuda/rowwise_scaled_linear_sparse_cutlass/rowwise_scaled_linear_sparse_cutlass_e5m2e4m3.cu	Sparse E5M2xE4M3	Make ABI compatible, not build by default
ABI stable #3725	@andrewor14	torchao/csrc/cuda/rowwise_scaled_linear_sparse_cutlass/rowwise_scaled_linear_sparse_cutlass_e5m2e5m2.cu	Sparse E5M2xE5M2	Make ABI compatible, not build by default
ABI stable #3725	@andrewor14	torchao/csrc/cuda/rowwise_scaled_linear_sparse_cutlass/rowwise_scaled_linear_sparse_cutlass_f8f8.cu	Sparse FP8xFP8	Make ABI compatible, not build by default
Done #3727	@jerryzh168	torchao/csrc/cuda/to_sparse_semi_structured_cutlass_sm9x/to_sparse_semi_structured_cutlass_sm9x_f8.cu	Semi-structured sparse FP8	Make ABI compatible, not build by default
No need to change		torchao/_models/sam2/csrc/connected_components.cu	Connected components (SAM2)	Move sam2 to somewhere else? We can probably delete this - this shouldn't block ABI I think, I don't think it is built by default

Status	Assignee	File	Description	Plan
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/utils_core.cuh	FP6 core utilities	delete
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/kernel_reduction.cuh	FP6 reduction kernel	delete
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/ptx_mma.cuh	FP6 PTX MMA	delete
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/kernel_matmul.cuh	FP6 matmul kernel	delete
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/utils_gmem.cuh	FP6 global memory utils	delete
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/utils_parallel_dequant.cuh	FP6 parallel dequant utils	delete
Deleted #3520	@howardzhang-cv	torchao/csrc/cuda/fp6_llm/ptx_cp.async.cuh	FP6 PTX async copy	delete
Deleted #3723	@jerryzh168	torchao/csrc/cuda/rowwise_scaled_linear_sparse_cutlass/rowwise_scaled_linear_cutlass.cuh	CUTLASS header	delete
ABI stable #3725	@andrewor14	torchao/csrc/cuda/rowwise_scaled_linear_sparse_cutlass/rowwise_scaled_linear_sparse_cutlass.cuh	Sparse CUTLASS header	Make ABI compatible
Done #3727	@jerryzh168	torchao/csrc/cuda/to_sparse_semi_structured_cutlass_sm9x/to_sparse_semi_structured_cutlass_sm9x.cuh	Semi-structured sparse header	Make ABI compatible
No need to change	@andrewor14	torchao/csrc/cuda/mx_kernels/mxfp8_quantize.cuh	MXFP8 quantize header	Make ABI compatible (has no torch C++ anyway)
No need to change	@andrewor14	torchao/csrc/cuda/mx_kernels/ptx.cuh	MX PTX header	Make ABI compatible

After the above is done, we can explore making torchao python only through:

compile on demand: https://github.com/NVIDIA/Megatron-LM/blob/v2.0/megatron/fused_kernels/__init__.py
move kernels to pytorch core, mslk
for cutlass kernels, we can use cute DSL as mentioned by @tonyf

Making torchao ABI stable and moving closer to python only #3516

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions