Add Int4TilePackedTo4dTensor by jerryzh168 · Pull Request #2791 · pytorch/ao

jerryzh168 · 2025-08-18T20:43:45Z

This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format.

Key features:

Implements tensor core tiled packing for efficient computation on tensor cores
Supports PackingFormat.TILE_PACKED_TO_4D in Int4WeightOnlyConfig version 2
Optimized for tinygemm int4mm kernel (_weight_int4pack_mm)
Includes comprehensive test suite

The implementation follows the same pattern as other int4 tensor subclasses but uses
a specialized packing format optimized for tensor core matrix multiplication performance.

Changes:

Add Int4TilePackedTo4dTensor implementation
Update Int4WeightOnlyConfig version 2 to support TILE_PACKED_TO_4D packing format
Add TILE_PACKED_TO_4D to PackingFormat enum
Add comprehensive tests including serialization, different group sizes, and error conditions
Update init.py files to export new tensor class

Test:
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py

pytorch-bot · 2025-08-18T20:43:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2791

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 22b937f with merge base 15a6de6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-08-27T12:00:39Z

+    """
+    tile_packed_to_4d is referring to the format used by tensor core tiled kernels for int4 quantization
+    """
+    TILE_PACKED_TO_4D = "tile_packed_to_4d"


lgtm, but in a separate PR would be good to delete this enum since we determined that PLAIN is the only format which is reused, and all the others are tensor-specific

yeah will do

we also want to delete global PackingFormat as well right?

vkuzo · 2025-08-27T12:01:15Z

+        )
+
+        original_shape = hp_tensor.shape
+        # use a fixed value to simplify api


what does this mean?

use fixed inner_k_tiles here to have a shorter arg list, I didn't see people change it anywhere

… 4d packing This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format. Key features: - Implements tensor core tiled packing for efficient computation on tensor cores - Supports PackingFormat.TILE_PACKED_TO_4D in Int4WeightOnlyConfig version 2 - Optimized for tinygemm int4mm kernel (_weight_int4pack_mm) - Includes comprehensive test suite The implementation follows the same pattern as other int4 tensor subclasses but uses a specialized packing format optimized for tensor core matrix multiplication performance. Changes: - Add Int4TilePackedTo4dTensor implementation - Update Int4WeightOnlyConfig version 2 to support TILE_PACKED_TO_4D packing format - Add TILE_PACKED_TO_4D to PackingFormat enum - Add comprehensive tests including serialization, different group sizes, and error conditions - Update __init__.py files to export new tensor class Test: python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 18, 2025

jerryzh168 changed the title ~~Add Int4TensorCoreTilePackedTensor for tensor core tiled int4 quantiz…~~ Add Int4TensorCoreTilePackedTensor Aug 18, 2025

jerryzh168 force-pushed the int4wo-tinygemm branch from 0b18c16 to e73434e Compare August 18, 2025 20:46

jerryzh168 requested review from drisspg, liangel-02, metascroy and vkuzo August 18, 2025 20:47

jerryzh168 added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 18, 2025

jerryzh168 requested a review from Xia-Weiwen August 18, 2025 21:45

jerryzh168 mentioned this pull request Aug 13, 2025

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Closed

17 tasks

jerryzh168 force-pushed the int4wo-tinygemm branch 2 times, most recently from ed800a1 to b878ee5 Compare August 20, 2025 03:08

metascroy reviewed Aug 20, 2025

View reviewed changes

Comment thread test/quantization/quantize_/workflows/int4/test_int4_tensor_core_tile_packed_tensor.py Outdated

metascroy reviewed Aug 20, 2025

View reviewed changes

Comment thread torchao/quantization/quantize_/workflows/int4/__init__.py

metascroy reviewed Aug 20, 2025

View reviewed changes

Comment thread torchao/quantization/quantize_/workflows/int4/int4_tensor_core_tile_packed_tensor.py Outdated

metascroy reviewed Aug 20, 2025

View reviewed changes

Comment thread test/quantization/quantize_/workflows/int4/test_int4_tensor_core_tile_packed_tensor.py Outdated

jerryzh168 force-pushed the int4wo-tinygemm branch from b878ee5 to 1922aaf Compare August 20, 2025 05:31

vkuzo reviewed Aug 20, 2025

View reviewed changes

Comment thread torchao/quantization/quantize_/workflows/int4/int4_tensor_core_tile_packed_tensor.py Outdated

jerryzh168 force-pushed the int4wo-tinygemm branch 7 times, most recently from 1be7de0 to 4046cc0 Compare August 26, 2025 01:02

jerryzh168 changed the title ~~Add Int4TensorCoreTilePackedTensor~~ Add Int4TilePackedTo4dTensor Aug 26, 2025

jerryzh168 requested review from andrewor14 and jainapurva August 26, 2025 17:53

jerryzh168 force-pushed the int4wo-tinygemm branch from 4046cc0 to f0b97a6 Compare August 27, 2025 01:25

vkuzo reviewed Aug 27, 2025

View reviewed changes

Comment thread torchao/quantization/__init__.py Outdated

vkuzo reviewed Aug 27, 2025

View reviewed changes

Comment thread torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py Outdated

jerryzh168 force-pushed the int4wo-tinygemm branch 4 times, most recently from 6411b23 to b2885aa Compare August 27, 2025 20:42

jerryzh168 requested review from metascroy and vkuzo August 28, 2025 20:17

metascroy approved these changes Aug 28, 2025

View reviewed changes

jerryzh168 force-pushed the int4wo-tinygemm branch from b2885aa to 99bdf5d Compare August 29, 2025 01:35

jerryzh168 force-pushed the int4wo-tinygemm branch from 99bdf5d to 22b937f Compare August 29, 2025 02:00

jerryzh168 merged commit 6176322 into pytorch:main Aug 29, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Int4TilePackedTo4dTensor#2791

Add Int4TilePackedTo4dTensor#2791
jerryzh168 merged 1 commit intopytorch:mainfrom
jerryzh168:int4wo-tinygemm

jerryzh168 commented Aug 18, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vkuzo Aug 27, 2025

Uh oh!

jerryzh168 Aug 27, 2025

Uh oh!

jerryzh168 Aug 27, 2025

Uh oh!

vkuzo Aug 27, 2025

Uh oh!

jerryzh168 Aug 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jerryzh168 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2791

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vkuzo Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jerryzh168 commented Aug 18, 2025 •

edited

Loading

pytorch-bot Bot commented Aug 18, 2025 •

edited

Loading

jerryzh168 Aug 27, 2025 •

edited

Loading