TrainingWeightWrapperTensor base class; subclasses for FP8/MXFP8 with grouped_mm and linear overrides by danielvegamyhre · Pull Request #3968 · pytorch/ao

danielvegamyhre · 2026-02-28T02:08:18Z

Tensor subclass changes

TrainingWeightWrapperBaseTensor: base Common logic for FSDP, torch_dispatch, subclass initialization etc is in this base class (not to be used directly)
- Common base class also enables common model conversion / param wrapping code
MXFP8 and FP8 tensor subclasses inherit from this and implement the override torch_function with the specific grouped_mm and linear overrides, dispatching to the appropriate autograd functions wrapping our kernels

Autograd function changes

Add new _to_mxfp8_then_scaled_mm autograd func to support linear op overrides. Supports wgrad_with_hp as well.

Other

Delete MXLinear and MXLinearConfig so we don't have two diverging ways of doing mxfp8 dense training. This also removes MXFP4 training support but nobody is using this as far as we know so not creating tech debt is preferable.

Tests

./test/prototype/moe_training/test_everything.sh
pytest test/prototype/mx_formats/test_mx_linear.py
pytest test/prototype/mx_formats/test_mx_tensor.py

pytorch-bot · 2026-02-28T02:08:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3968

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 6bcfb53 with merge base 4ae435e ():

NEW FAILURE - The following job has failed:

Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2026-03-02T19:45:52Z


-class GroupedMMConfig(AOBaseConfig):
-    """Base configuration for grouped matrix multiplication. Not intended to be used directly."""
+class TrainingBaseConfig(AOBaseConfig):


the name is very generic, how about TrainingOpBaseConfig to clarify this is for a single op

vkuzo · 2026-03-02T19:46:18Z


 @dataclass
-class FP8GroupedMMConfig(GroupedMMConfig):
+class FP8GroupedMMConfig(TrainingBaseConfig):


Float8 instead of Fp8, to match PyTorch naming for float8?

vkuzo · 2026-03-02T19:47:13Z

 @register_as_pytree_constant
 @dataclass
-class MXFP8GroupedMMConfig(GroupedMMConfig):
+class MXFP8TrainingConfig(TrainingBaseConfig):


MXFP8OpTrainingConfig?

vkuzo · 2026-03-02T19:50:36Z

+    @classmethod
+    def __torch_function__(cls, func, types, args, kwargs={}):
+        # grouped_mm op override
+        if func.__name__ == cls.grouped_mm_func_name:


this is confusing, can this just state the op directly since we are already inside the float8 wrapper?

vkuzo · 2026-03-02T19:51:01Z

+    @classmethod
+    def __torch_function__(cls, func, types, args, kwargs={}):
+        # grouped_mm op override
+        if func.__name__ == cls.grouped_mm_func_name:


just say the op directly?

vkuzo · 2026-03-02T19:51:28Z

+                )
+
+        # linear op override
+        elif func.__name__ in cls.mm_func_names:


just put the ops here? making the code reader jump around to know which ops go here is confusing

vkuzo · 2026-03-02T19:53:41Z

looks good, I care about cleaning up the func.__name__ == cls.grouped_mm_func_name and elif func.__name__ in cls.mm_func_names the most from my nit comments, thank you!

vkuzo

lg if CI passes and you are sure this does not regress anything

danielvegamyhre · 2026-03-02T22:38:48Z

addressed comments, will land once CI green

…torchao (#2520) ## Summary - Refactor MXFP8 model converter - Previous: 2 separate converters (linear, grouped_mm) - New: 1 unified converter for linear and grouped_mm ops - Details: pytorch/ao#3968 - Add `pad_token_groups_for_grouped_mm` config option to use dynamic per group padding kernels for MXFP8 grouped mm in torchao, so we can delete padding code from torchtitan (context: #2255) - torchao PR stack (must land first): pytorch/ao#4021 ## Tests - TODO: manually test this change prior to landing and update PR

…torchao (pytorch#2520) ## Summary - Refactor MXFP8 model converter - Previous: 2 separate converters (linear, grouped_mm) - New: 1 unified converter for linear and grouped_mm ops - Details: pytorch/ao#3968 - Add `pad_token_groups_for_grouped_mm` config option to use dynamic per group padding kernels for MXFP8 grouped mm in torchao, so we can delete padding code from torchtitan (context: pytorch#2255) - torchao PR stack (must land first): pytorch/ao#4021 ## Tests - TODO: manually test this change prior to landing and update PR

vkuzo · 2026-04-17T10:20:06Z

-        ScaleCalculationMode.RCEIL,
-    ],
-)
-def test_linear_compile(


@danielvegamyhre was this moved over? i can't find it

Sort of - linear tests were replaced with training test cases here - "shared_experts" FQN is a linear layer, and there is a compile boolean parameterization as well, so it tests mxfp8 linear in eager and with compile.

…torchao (pytorch#2520) ## Summary - Refactor MXFP8 model converter - Previous: 2 separate converters (linear, grouped_mm) - New: 1 unified converter for linear and grouped_mm ops - Details: pytorch/ao#3968 - Add `pad_token_groups_for_grouped_mm` config option to use dynamic per group padding kernels for MXFP8 grouped mm in torchao, so we can delete padding code from torchtitan (context: pytorch#2255) - torchao PR stack (must land first): pytorch/ao#4021 ## Tests - TODO: manually test this change prior to landing and update PR

danielvegamyhre added 2 commits February 27, 2026 18:05

[mxfp8 training] unified tensor subclass for training

e65aacc

[mxfp8 training] remove mxfp8 from MXLinear and MXLinearConfig

dc73e74

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 28, 2026

danielvegamyhre force-pushed the traintensor branch 3 times, most recently from d0025e8 to 248405c Compare March 2, 2026 17:48

danielvegamyhre added module: training quantize_ api training flow moe labels Mar 2, 2026

danielvegamyhre changed the title ~~[WIP] unified tensor subclass for training~~ TorchAOTrainingTensor base class; MXFP8TrainingTensor and FP8TrainingTensor subclasses with grouped_mm and linear overrides Mar 2, 2026

danielvegamyhre requested a review from vkuzo March 2, 2026 18:11

danielvegamyhre force-pushed the traintensor branch from 248405c to 7fb1c2c Compare March 2, 2026 18:34

vkuzo reviewed Mar 2, 2026

View reviewed changes

vkuzo approved these changes Mar 2, 2026

View reviewed changes

danielvegamyhre force-pushed the traintensor branch from 7fb1c2c to cf26d1d Compare March 2, 2026 21:03

danielvegamyhre changed the title ~~TorchAOTrainingTensor base class; MXFP8TrainingTensor and FP8TrainingTensor subclasses with grouped_mm and linear overrides~~ TrainingWeightWrapperTensor base class; subclasses for FP8/MXFP8 with grouped_mm and linear overrides Mar 2, 2026

danielvegamyhre force-pushed the traintensor branch 5 times, most recently from e1743e3 to 3ff088b Compare March 2, 2026 21:52

danielvegamyhre mentioned this pull request Mar 2, 2026

[mxfp8 training] refactor to use single MXFP8 training converter pytorch/torchtitan#2470

Closed

danielvegamyhre force-pushed the traintensor branch from 3ff088b to 49ab85c Compare March 2, 2026 23:58

danielvegamyhre force-pushed the traintensor branch 3 times, most recently from 5610df4 to 79de41d Compare March 3, 2026 19:00

[moe training] unified tensor subclass for training

381abbf

danielvegamyhre force-pushed the traintensor branch 2 times, most recently from 3d258d2 to 89c1a8d Compare March 3, 2026 22:30

delete MXLinear and MXLinearConfig entirely

6bcfb53

danielvegamyhre force-pushed the traintensor branch from 89c1a8d to 6bcfb53 Compare March 3, 2026 23:02

danielvegamyhre merged commit b8708a2 into main Mar 4, 2026
25 of 26 checks passed

danielvegamyhre mentioned this pull request Mar 7, 2026

[mxfp8] refactor model converter; use token group padding kernels in torchao pytorch/torchtitan#2520

Merged

vkuzo reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TrainingWeightWrapperTensor base class; subclasses for FP8/MXFP8 with grouped_mm and linear overrides#3968

TrainingWeightWrapperTensor base class; subclasses for FP8/MXFP8 with grouped_mm and linear overrides#3968
danielvegamyhre merged 4 commits into
mainfrom
traintensor

danielvegamyhre commented Feb 28, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 28, 2026 •

edited

Loading

Uh oh!

vkuzo Mar 2, 2026

Uh oh!

vkuzo Mar 2, 2026

Uh oh!

vkuzo Mar 2, 2026

Uh oh!

vkuzo Mar 2, 2026

Uh oh!

vkuzo Mar 2, 2026

Uh oh!

vkuzo Mar 2, 2026

Uh oh!

vkuzo commented Mar 2, 2026

Uh oh!

vkuzo left a comment

Uh oh!

danielvegamyhre commented Mar 2, 2026

Uh oh!

Uh oh!

vkuzo Apr 17, 2026

Uh oh!

danielvegamyhre Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielvegamyhre commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tensor subclass changes

Autograd function changes

Other

Tests

Uh oh!

pytorch-bot Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3968

❌ 1 New Failure

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Mar 2, 2026

Uh oh!

vkuzo left a comment

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre commented Mar 2, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielvegamyhre commented Feb 28, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 28, 2026 •

edited

Loading