Add UIntxBitPackedTensor, UIntxWeightOnlyConfig, and Int8DynamicActivationUIntxWeightConfig#4082
Add UIntxBitPackedTensor, UIntxWeightOnlyConfig, and Int8DynamicActivationUIntxWeightConfig#4082jerryzh168 merged 17 commits intomainfrom
Conversation
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4082
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 62b4c14 with merge base 6e5ea54 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: bbbe034 Pull Request resolved: #4082
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: 858bd16 Pull Request resolved: #4082
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: de8cbe1 Pull Request resolved: #4082
…onfig, and Int8DynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: 7b8c09c Pull Request resolved: #4082
…onfig, and Int8DynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: 01018a9 Pull Request resolved: #4082
…onfig, and Int8DynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: e0729aa Pull Request resolved: #4082
…onfig, and Int8DynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: cef13ad Pull Request resolved: #4082
|
@claude review this PR |
|
Claude finished @vkuzo's task in 6m 21s —— View job
Review SummaryThis PR adds a v2 tensor subclass Issues1. The 2. gemlite_kwargs = self.gemlite_kwargs.copy()This is a shallow copy. Mutable nested objects (like 3. except Exception:
gemlite = NoneUsing 4. Hard int_data = int_data.cuda() # line 127
self.packed_weight.cuda() # line 253These force data onto Suggestions (non-blocking)5. No serialization or The test file covers quantization, forward pass, SQNR, slicing, FQN support, and non-standard shapes — which is good. However, there are no tests for:
These are important for production use. Other tensor subclasses in the codebase are typically tested with these. Consider adding them in a follow-up. 6. The old 7. Duplicated validation logic between configs and Both 8. The 9. The 10. Slice doesn't update After slicing, What looks good
|
…onfig, and Int8DynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: 90da320 Pull Request resolved: #4082
| | mxfp4 | mxfp4 | {class}`~torchao.prototype.mx_formats.MXDynamicActivationMXWeightConfig`(prototype): Applies mxfp8 or mxfp4 dynamic quantization to activations and weights. Requires NVIDIA SM100+ (Blackwell) or AMD MI350+. | | ||
| | intx | bf16 | {class}`~torchao.quantization.IntxWeightOnlyConfig`: Applies intx (1-8 bit) weight-only quantization. Supports groupwise and per-channel. Works with Linear and Conv2D. | | ||
| | intx | int8 | {class}`~torchao.quantization.Int8DynamicActivationIntxWeightConfig`: Applies int8 dynamic per-token activation and intx (1-8 bit) weight quantization. CPU optimized. | | ||
| | uintx (4/8-bit) | bf16 | {class}`~torchao.prototype.quantization.UIntxWeightOnlyConfig`(prototype): Applies 4-bit (asymmetric, grouped) or 8-bit (symmetric, per-channel) weight-only quantization using gemlite (https://github.com/dropbox/gemlite) Triton kernels. Supports packing bit widths 8, 16, 32. Requires CUDA and gemlite. optimized for A100 and H100 GPUs. | |
vkuzo
left a comment
There was a problem hiding this comment.
thank you! looks good if CI is green
…onfig, and Int8DynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: 18a33f3 Pull Request resolved: #4082
…onfig, and Int8DynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ynamicActivationUIntxWeightConfig" Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
…ationUIntxWeightConfig Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works ghstack-source-id: c8e4e7e Pull Request resolved: #4082
|
merging for now. we can also add a |
Stack from ghstack (oldest at bottom):
Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using
gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based
GemliteUIntXWeightOnlyConfig path.
and aten.linear/t/slice dispatch implementations
Test Plan:
Addressing #3891