Skip to content

Delete deprecated TensorCoreTiledLayout and related code#4153

Merged
jerryzh168 merged 47 commits into
mainfrom
gh/jerryzh168/69/head
Apr 1, 2026
Merged

Delete deprecated TensorCoreTiledLayout and related code#4153
jerryzh168 merged 47 commits into
mainfrom
gh/jerryzh168/69/head

Conversation

@jerryzh168

@jerryzh168 jerryzh168 commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

  • Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
  • Remove TensorCoreTiledLayout dispatch from AQT dispatch table
  • Remove TensorCoreTiledLayout from all public exports
  • Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
  • Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
  • Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
  • Remove dead v1 code path in fake_quantize_config.py
  • Update autoround to use Int4TilePackedTo4dTensor directly

… paths

- Remove PlainLayout class from dtypes/utils.py
- Delete torchao/dtypes/uintx/plain_layout.py
- Remove int8 weight and int8 dynamic activation dispatch from AQT dispatch table
- Remove AQT embedding dispatch (used PlainAQTTensorImpl)
- Change Int8WeightOnlyConfig default to version=2 (removes v1 AQT path)
- Change Int8DynamicActivationInt8WeightConfig default to version=2 (removes v1 AQT path)
- Remove PlainLayout from public exports
- Update tests to use v2 tensor types

[ghstack-poisoned]
- Copy bitpacking.py from torchao/dtypes/uintx/ to torchao/prototype/dtypes/uintx/
  (original was deleted in the previous commit)
- Add uintx_utils.py with UintxTensor and UintxLayout classes needed by
  autoround and codebook features until they migrate to the new quantization design

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Mar 23, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4153

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7415455 with merge base 3e6c496 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 added a commit that referenced this pull request Mar 23, 2026
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

ghstack-source-id: 7132050
Pull Request resolved: #4153
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026
…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Mar 23, 2026
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

ghstack-source-id: c23e77a
Pull Request resolved: #4153
…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Mar 23, 2026
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

ghstack-source-id: 863d91a
Pull Request resolved: #4153
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
@jerryzh168

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Mar 28, 2026

Copy link
Copy Markdown

Claude finished @jerryzh168's task in 2m 20s —— View job


Review: Delete deprecated TensorCoreTiledLayout and related code

  • Gather context and read changed files
  • Review deleted tensor_core_tiled_layout.py
  • Review changes to affine_quantized_tensor.py and ops
  • Review changes to autoround/core.py
  • Review remaining changes (exports, init files, etc.)
  • Post review summary

Overall this is a clean deprecation removal. The changes are straightforward and consistent across the codebase. A few observations:

Looks Good

  1. Complete removal of TensorCoreTiledLayout references — grep confirms zero remaining references to TensorCoreTiledLayout or tensor_core_tiled in the codebase (aside from one stale comment, noted below).

  2. affine_quantized_tensor.py simplifications — The HQQ path simplification at line 278 (raw_output=not isinstance(_layout, PlainLayout)) is correct. Since TensorCoreTiledLayout is removed, the original condition not isinstance(_layout, (TensorCoreTiledLayout, PlainLayout)) simplifies correctly. The padding removal in dequantize() is also correct since only TensorCoreTiledLayout needed that post-dequant narrow.

  3. affine_quantized_tensor_ops.py — Clean removal of the dispatch entry for _linear_bf16_act_uint4_weight_check/impl.

  4. fake_quantize_config.py — Dead version == 1 code path removed. The remaining code only supports version == 2, which is the right state.

  5. quant_api.pyLAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts cleanly removed. No remaining consumers.

  6. autoround core.py migration — The migration from TensorCoreTiledLayout-based to_affine_quantized_intx_static to directly constructing Int4TilePackedTo4dTensor looks correct. The padding, packing (_convert_weight_to_int4pack), and pack_tinygemm_scales_and_zeros logic faithfully reproduces what TensorCoreTiledLayout.from_plain + pre_process used to do.

Minor Issues

  1. Stale comment in quant_primitives.py:1333 — There's still a reference to TensorCoreTiledFormat in a comment:

    # this is not preserving zero_point, this is converting to TensorCoreTiledFormat

    This should be updated to reference "tinygemm format" to match the updated comment at line 1982.
    Fix this →

  2. torchao/dtypes/uintx/__init__.py should be deleted — Per andrewor14's review comment (already acknowledged by the author): the file now only contains __all__ = [] and nothing imports from torchao.dtypes.uintx as a package (existing imports go directly to torchao.dtypes.uintx.plain_layout). This file can be safely deleted.
    Fix this →

  3. Unused import removal — In torchao/prototype/autoround/core.py, the import from torchao.dtypes import to_affine_quantized_intx_static at line 14 is no longer used by the to_int4_tinygemm_weight function (which now constructs Int4TilePackedTo4dTensor directly). However, it's still used by the to_uintx_weight path at line 205, so this is fine. Disregard.

Summary

The PR is a clean removal of deprecated code. The two actionable items are the stale comment at quant_primitives.py:1333 and deleting the now-empty uintx/__init__.py file (already agreed upon in prior review).


…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
@jerryzh168

Copy link
Copy Markdown
Contributor Author

@claude review again

@claude

claude Bot commented Mar 31, 2026

Copy link
Copy Markdown

Claude finished @jerryzh168's task in 5m 10s —— View job


Review: Delete deprecated TensorCoreTiledLayout and related code

  • Gather context and read PR diff
  • Review deleted tensor_core_tiled_layout.py
  • Review changes to affine_quantized_tensor.py and ops
  • Review changes to autoround/core.py
  • Review remaining changes (exports, init files, quant_api, etc.)
  • Check for stale references to TensorCoreTiledLayout
  • Post review summary

This is a clean deprecation removal with complete reference cleanup. All stale comments from the previous review (TensorCoreTiledFormat references) have been addressed. One issue carries over from the prior review, and I found a new potential correctness bug in the autoround migration.

Potential Correctness Bug

torchao/prototype/autoround/core.py:270-274 — Missing quantization step in to_int4_tinygemm_weight

The new code replaces the old to_affine_quantized_intx_static(... _layout=TensorCoreTiledLayout(...)) call with manual packing, but the quantization step was lost in translation.

Old code path:

return to_affine_quantized_intx_static(
    input_float=input_float,      # qdq float weight from auto-round
    scale=pad_scale,
    zero_point=pad_shifted_zero_point,
    ...
    zero_point_domain=ZeroPointDomain.FLOAT,
    _layout=TensorCoreTiledLayout(...),
)

This called _quantize_affine_tinygemm which does:

mid_point = (quant_max + quant_min + 1) / 2   # 8 for uint4
min_val = zero_point - scale * mid_point
quant = clamp(round((input - min_val) / scale), quant_min, quant_max)

New code:

int_data = torch.clamp(
    torch.round(input_float_padded).to(torch.int32),
    quant_min,
    quant_max,
)

input_float is the dequantized float weight from auto-round (typical float values), not integer values. Simply rounding these floats does not re-quantize them — the division by scale and offset by zero_point is missing. The new code should call _quantize_affine_tinygemm (or equivalent math) on the padded input with the padded scale/zero_point, similar to what from_hp_to_intx_static does.

This path is not covered by existing tests (test/prototype/test_autoround.py has no int4 tinygemm tests), so CI won't catch this.

Fix this →

Looks Good

  1. Complete removalgrep confirms zero remaining references to TensorCoreTiled or tensor_core_tiled in the codebase.

  2. affine_quantized_tensor.py simplifications — The HQQ raw_output simplification at line 278 is correct: raw_output=not isinstance(_layout, PlainLayout) correctly simplifies the old not isinstance(_layout, (TensorCoreTiledLayout, PlainLayout)). The dequantize padding removal is correct since only TensorCoreTiledLayout used that post-dequant narrow.

  3. affine_quantized_tensor_ops.py — Clean removal of the _linear_bf16_act_uint4_weight_check/impl dispatch entry.

  4. fake_quantize_config.py — Dead version == 1 code path removed; remaining else now correctly raises for unknown versions.

  5. quant_api.pyLAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts cleanly removed with no remaining consumers.

  6. quant_primitives.py — Comments updated from "TensorCoreTiledFormat" to "tinygemm format". Dead TODO comments about moving conversion into TensorCoreTiledLayout.from_plain removed.

  7. utils.py — Comment updated to remove specific TensorCoreTiledLayout/TensorCoreTiledAQTTensorImpl references.

  8. float8/inference.py — Import-reordering only. The reformatted assert has identical Python operator precedence to the original (the pre-existing and/or precedence behavior is unchanged).

  9. test_float8_tensor.py — Trivial import reordering (from unittest.mock import patch moved to proper stdlib position).

Remaining Item from Prior Review

torchao/dtypes/uintx/__init__.py should be deleted — The file now contains only __all__ = []. No imports use the torchao.dtypes.uintx package (existing imports go directly to torchao.dtypes.uintx.plain_layout). This was agreed upon in the prior review.


…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
…d related code"

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
@jerryzh168

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Apr 1, 2026

Copy link
Copy Markdown

Claude finished @jerryzh168's task in 3m 39s —— View job


Review: Delete deprecated TensorCoreTiledLayout and related code

  • Gather context and read PR diff
  • Review deleted tensor_core_tiled_layout.py
  • Review changes to affine_quantized_tensor.py and ops
  • Review changes to autoround/core.py
  • Review remaining changes (exports, init files, quant_api, etc.)
  • Check for stale references to TensorCoreTiledLayout
  • Post review summary

Clean deprecation removal. All previous review issues have been addressed — the correctness bug in autoround quantization is fixed, stale TensorCoreTiledFormat comments are updated. No remaining issues found.

Verified Changes

  1. autoround/core.py — quantization bug fixed: The previous review identified that torch.round(input_float_padded) was missing the re-quantization step. This is now fixed at lines 271-289 with the correct formula matching _quantize_affine_tinygemm:

    min_val = zp_for_quant - scale_for_quant * mid_point
    int_data = clamp(round((input_for_quant - min_val) / scale_for_quant), ...)

    The Int4TilePackedTo4dTensor constructor call at line 302 correctly passes qdata, scale_and_zero, block_size, and shape, matching the class signature.

  2. affine_quantized_tensor.py simplifications — HQQ raw_output at line 278: raw_output=not isinstance(_layout, PlainLayout) correctly simplifies the old not isinstance(_layout, (TensorCoreTiledLayout, PlainLayout)). Dequantize narrow removal is correct since only TensorCoreTiledLayout used post-dequant padding.

  3. affine_quantized_tensor_ops.py — Clean removal of _linear_bf16_act_uint4_weight_check/impl dispatch entry.

  4. fake_quantize_config.py — Dead version == 1 code path for Int4WeightOnlyConfig removed. The remaining else correctly raises for unknown versions.

  5. quant_api.pyLAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts removed with no remaining consumers.

  6. quant_primitives.py — Comments updated from "TensorCoreTiledFormat" to "tinygemm format". Dead TODOs about moving conversion into TensorCoreTiledLayout.from_plain removed.

  7. utils.py — Comment updated to remove specific TensorCoreTiledLayout/TensorCoreTiledAQTTensorImpl references.

  8. Zero stale referencesgrep confirms no remaining references to TensorCoreTiled or tensor_core_tiled in the codebase.

One Remaining Note

torchao/dtypes/uintx/__init__.py still contains only __all__ = []. Per prior discussion, this will be deleted in a separate PR.


View job

@jerryzh168

Copy link
Copy Markdown
Contributor Author

@claude regard "torchao/dtypes/uintx/init.py should be deleted — The file now contains only all = []. No imports use the torchao.dtypes.uintx package (existing imports go directly to torchao.dtypes.uintx.plain_layout). This was #4153 (comment) in the prior review." I tried to remove but some import failed, so I'm going to do it later

@claude

claude Bot commented Apr 1, 2026

Copy link
Copy Markdown

Claude finished @jerryzh168's task in 34s —— View job


Acknowledging feedback on uintx/__init__.py deletion

  • Read and understand the context
  • Respond to the comment

Understood — deferring the torchao/dtypes/uintx/__init__.py deletion to a follow-up PR makes sense if it causes import failures. The file is harmless as-is (just __all__ = []), so there's no urgency.

From the previous reviews, the other actionable items (stale TensorCoreTiledFormat comments in quant_primitives.py, and the potential correctness issue in autoround/core.py regarding the missing quantization step in to_int4_tinygemm_weight) appear to have been addressed in the latest revision. No remaining blockers from my side.


@jerryzh168 jerryzh168 changed the base branch from gh/jerryzh168/69/base to main April 1, 2026 16:55
@jerryzh168 jerryzh168 merged commit 3e0da46 into main Apr 1, 2026
36 checks passed
jerryzh168 added a commit to jerryzh168/torchtune that referenced this pull request Apr 3, 2026
Replace deleted TensorCoreTiledLayout with Int4PackingFormat.TILE_PACKED_TO_4D
in torchtune's Int4WeightOnlyQuantizer to fix the import error caused by the
upstream torchao change in pytorch/ao#4153.
jerryzh168 added a commit to jerryzh168/torchtune that referenced this pull request Apr 3, 2026
Replace deleted TensorCoreTiledLayout with Int4PackingFormat.TILE_PACKED_TO_4D
in torchtune's Int4WeightOnlyQuantizer to fix the import error caused by the
upstream torchao change in pytorch/ao#4153.
Priyjain-amd pushed a commit to Priyjain-amd/ao that referenced this pull request May 26, 2026
* Delete deprecated BlockSparseLayout and related code

- Delete torchao/dtypes/uintx/tensor_core_tiled_layout.py
- Remove TensorCoreTiledLayout dispatch from AQT dispatch table
- Remove TensorCoreTiledLayout from all public exports
- Remove LAYOUT_TO_ZERO_POINT_DOMAIN and LAYOUT_TO_PRESERVE_ZEROS dicts
- Remove TensorCoreTiledLayout padding logic from AffineQuantizedTensor.dequantize
- Simplify HQQ path in AffineQuantizedTensor to always use raw_output=True
- Remove dead v1 code path in fake_quantize_config.py
- Update autoround to use Int4TilePackedTo4dTensor directly

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants