Add Int4TilePackedTo4dTensor#2791
Merged
jerryzh168 merged 1 commit intopytorch:mainfrom Aug 29, 2025
Merged
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2791
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 22b937f with merge base 15a6de6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
0b18c16 to
e73434e
Compare
17 tasks
ed800a1 to
b878ee5
Compare
metascroy
reviewed
Aug 20, 2025
metascroy
reviewed
Aug 20, 2025
metascroy
reviewed
Aug 20, 2025
metascroy
reviewed
Aug 20, 2025
b878ee5 to
1922aaf
Compare
vkuzo
reviewed
Aug 20, 2025
1be7de0 to
4046cc0
Compare
4046cc0 to
f0b97a6
Compare
vkuzo
reviewed
Aug 27, 2025
vkuzo
reviewed
Aug 27, 2025
| """ | ||
| tile_packed_to_4d is referring to the format used by tensor core tiled kernels for int4 quantization | ||
| """ | ||
| TILE_PACKED_TO_4D = "tile_packed_to_4d" |
Contributor
There was a problem hiding this comment.
lgtm, but in a separate PR would be good to delete this enum since we determined that PLAIN is the only format which is reused, and all the others are tensor-specific
Contributor
Author
There was a problem hiding this comment.
we also want to delete global PackingFormat as well right?
vkuzo
reviewed
Aug 27, 2025
| ) | ||
|
|
||
| original_shape = hp_tensor.shape | ||
| # use a fixed value to simplify api |
Contributor
Author
There was a problem hiding this comment.
use fixed inner_k_tiles here to have a shorter arg list, I didn't see people change it anywhere
vkuzo
reviewed
Aug 27, 2025
6411b23 to
b2885aa
Compare
metascroy
approved these changes
Aug 28, 2025
b2885aa to
99bdf5d
Compare
… 4d packing This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format. Key features: - Implements tensor core tiled packing for efficient computation on tensor cores - Supports PackingFormat.TILE_PACKED_TO_4D in Int4WeightOnlyConfig version 2 - Optimized for tinygemm int4mm kernel (_weight_int4pack_mm) - Includes comprehensive test suite The implementation follows the same pattern as other int4 tensor subclasses but uses a specialized packing format optimized for tensor core matrix multiplication performance. Changes: - Add Int4TilePackedTo4dTensor implementation - Update Int4WeightOnlyConfig version 2 to support TILE_PACKED_TO_4D packing format - Add TILE_PACKED_TO_4D to PackingFormat enum - Add comprehensive tests including serialization, different group sizes, and error conditions - Update __init__.py files to export new tensor class Test: python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py
99bdf5d to
22b937f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format.
Key features:
The implementation follows the same pattern as other int4 tensor subclasses but uses
a specialized packing format optimized for tensor core matrix multiplication performance.
Changes:
Test:
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py