Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT by metascroy · Pull Request #2742 · pytorch/ao

metascroy · 2025-08-12T06:28:43Z

This adds IntxOpaqueTensor to replace the AQT tensor with PackedInt8DynamicActivationIntxWeightLayout since AQT will be removed.

The test plan are the new unit tests.

pytorch-bot · 2025-08-12T06:28:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2742

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 9 Pending

As of commit 1157903 with merge base 8722c0c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-08-19T01:28:20Z

+            new_bias = None  # bias is packed with weights
+        else:
+            assert packing_format == PackingFormat.UNPACKED_TO_INT8
+            new_weight = to_linear_activation_quantized(


btw, we are planning to move away from to_linear_activation_quantized as well, to reduce the abstractions, we are implementing dynamic activation in the tensor itself, see

ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py

Lines 244 to 247 in 72b35bf

if act_quant_kwargs is not None:

input_tensor = _choose_quant_func_and_quantize_tensor(

input_tensor, act_quant_kwargs

)

for example. also api: https://docs.pytorch.org/ao/main/quantization_overview.html#dynamic-activation-and-weight-quantization

this can be a separate PR though

Ok, let me change that then

jerryzh168 · 2025-08-20T01:12:49Z

 _FLOAT_TYPES: List[torch.dtype] = [torch.float16, torch.bfloat16, torch.float32]


+class ActivationQuantization(enum.Enum):


why are we doing this? the recommended way is

ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py

Lines 243 to 247 in af2cf1e

# quantizing activation, if `act_quant_kwargs` is specified

if act_quant_kwargs is not None:

input_tensor = _choose_quant_func_and_quantize_tensor(

input_tensor, act_quant_kwargs

)

or just inline the quantization code in linear

The quantization code is inlined in linear in lines 233-249.

The enum was to support applying activation quant (ActivationQuantization. DYNAMIC_INT8_ASYMMETRIC_PER_TOKEN) vs. weight only (None). I made it an enum so it could be extended.

I can use _choose_quant_func_and_quantize_tensor instead, but out of curiosity, why is that is the design to have_choose_quant_func_and_quantize_tensor in a common directory? It seems its implementation would just have many if/else if based on various tensor subclasses?

the reason for _choose_quant_func_and_quantize_tensor having if/else was for readability and removing indirections, alternative is to have a config I think

but this is not always required, you can also call the activation quantization in the linear op directly as well, without using this function, especially when you only have a single possible type of activation quantization

jerryzh168 · 2025-08-20T01:14:50Z

+
+class ComputeTarget(enum.Enum):
+    """
+    This packs the tensor for PyTorch CPU kernels in ATen.


might be good to add some description on how this differs from KernelPreference

jerryzh168 · 2025-08-20T01:16:09Z

also why is this change here? should this be a separate PR?

This PR is about updating int8_dynamic_activation_intx_weight to version 2. Most of the work to do that is creating v2 of the tile_packed tensor. But it also required updating the unpacked tensor to support dynamic activation quant.

I can move the unpacked to a separate PR, though.

metascroy · 2025-08-26T18:06:25Z

@jerryzh168 I've updated this PR to use the new opaque tensor

jerryzh168 · 2025-08-26T20:13:37Z

+        # Create packed tensor
+        if packing_format == PackingFormat.OPAQUE:
+            assert compute_target is not None, (
+                "Must specify a compute target for PackingFormat.TILE_PACKED"


nit: update comment

jerryzh168

lg, is this the last change before we can bump the version?

metascroy · 2025-08-26T20:19:56Z

lg, is this the last change before we can bump the version?

should be. Just need to do the packing format refactor you mentioned

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 12, 2025

metascroy force-pushed the intx-packed branch from a3c16c8 to 90b20ad Compare August 18, 2025 03:10

metascroy added 9 commits August 18, 2025 18:11

up

a5f841c

Refactor packed format to remove AQT

5031820

up

ae8e9f8

up

1dd789a

up

248a98f

up

a829b10

up

01e1995

up

d3eef43

up

a74108e

metascroy force-pushed the intx-packed branch from 90b20ad to a74108e Compare August 19, 2025 01:11

up

5a6bc79

metascroy marked this pull request as ready for review August 19, 2025 01:15

metascroy added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 19, 2025

metascroy requested a review from jerryzh168 August 19, 2025 01:15

metascroy changed the title ~~Refactor packed intx tensor to remove AQT~~ Introduce IntxTilePackedTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT Aug 19, 2025

jerryzh168 mentioned this pull request Aug 13, 2025

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Closed

17 tasks