Skip to content

Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT#2742

Merged
metascroy merged 23 commits intomainfrom
intx-packed
Aug 27, 2025
Merged

Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT#2742
metascroy merged 23 commits intomainfrom
intx-packed

Conversation

@metascroy
Copy link
Copy Markdown
Contributor

@metascroy metascroy commented Aug 12, 2025

This adds IntxOpaqueTensor to replace the AQT tensor with PackedInt8DynamicActivationIntxWeightLayout since AQT will be removed.

The test plan are the new unit tests.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Aug 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2742

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 9 Pending

As of commit 1157903 with merge base 8722c0c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 12, 2025
@metascroy metascroy marked this pull request as ready for review August 19, 2025 01:15
@metascroy metascroy added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 19, 2025
@metascroy metascroy requested a review from jerryzh168 August 19, 2025 01:15
@metascroy metascroy changed the title Refactor packed intx tensor to remove AQT Introduce IntxTilePackedTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT Aug 19, 2025
Comment thread torchao/quantization/quant_api.py Outdated
Comment thread torchao/quantization/quant_api.py Outdated
Comment thread torchao/quantization/quantize_/common/packing_format.py Outdated
Comment thread torchao/quantization/quant_api.py Outdated
new_bias = None # bias is packed with weights
else:
assert packing_format == PackingFormat.UNPACKED_TO_INT8
new_weight = to_linear_activation_quantized(
Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we are planning to move away from to_linear_activation_quantized as well, to reduce the abstractions, we are implementing dynamic activation in the tensor itself, see

if act_quant_kwargs is not None:
input_tensor = _choose_quant_func_and_quantize_tensor(
input_tensor, act_quant_kwargs
)
for example. also api: https://docs.pytorch.org/ao/main/quantization_overview.html#dynamic-activation-and-weight-quantization

this can be a separate PR though

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me change that then

Comment thread torchao/quantization/quant_api.py Outdated
_FLOAT_TYPES: List[torch.dtype] = [torch.float16, torch.bfloat16, torch.float32]


class ActivationQuantization(enum.Enum):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing this? the recommended way is

# quantizing activation, if `act_quant_kwargs` is specified
if act_quant_kwargs is not None:
input_tensor = _choose_quant_func_and_quantize_tensor(
input_tensor, act_quant_kwargs
)
or just inline the quantization code in linear

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantization code is inlined in linear in lines 233-249.

The enum was to support applying activation quant (ActivationQuantization. DYNAMIC_INT8_ASYMMETRIC_PER_TOKEN) vs. weight only (None). I made it an enum so it could be extended.

I can use _choose_quant_func_and_quantize_tensor instead, but out of curiosity, why is that is the design to have_choose_quant_func_and_quantize_tensor in a common directory? It seems its implementation would just have many if/else if based on various tensor subclasses?

Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason for _choose_quant_func_and_quantize_tensor having if/else was for readability and removing indirections, alternative is to have a config I think

but this is not always required, you can also call the activation quantization in the linear op directly as well, without using this function, especially when you only have a single possible type of activation quantization


class ComputeTarget(enum.Enum):
"""
This packs the tensor for PyTorch CPU kernels in ATen.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to add some description on how this differs from KernelPreference

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also why is this change here? should this be a separate PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is about updating int8_dynamic_activation_intx_weight to version 2. Most of the work to do that is creating v2 of the tile_packed tensor. But it also required updating the unpacked tensor to support dynamic activation quant.

I can move the unpacked to a separate PR, though.

@metascroy metascroy changed the title Introduce IntxTilePackedTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT Aug 26, 2025
@metascroy
Copy link
Copy Markdown
Contributor Author

@jerryzh168 I've updated this PR to use the new opaque tensor

Comment thread torchao/quantization/quant_api.py Outdated
# Create packed tensor
if packing_format == PackingFormat.OPAQUE:
assert compute_target is not None, (
"Must specify a compute target for PackingFormat.TILE_PACKED"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update comment

Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg, is this the last change before we can bump the version?

@metascroy
Copy link
Copy Markdown
Contributor Author

lg, is this the last change before we can bump the version?

should be. Just need to do the packing format refactor you mentioned

@metascroy metascroy merged commit 8669213 into main Aug 27, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants