[mxfp8] refactor model converter; use token group padding kernels in torchao by danielvegamyhre · Pull Request #2520 · pytorch/torchtitan

danielvegamyhre · 2026-03-07T06:14:55Z

Summary

Refactor MXFP8 model converter
- Previous: 2 separate converters (linear, grouped_mm)
- New: 1 unified converter for linear and grouped_mm ops
- Details: TrainingWeightWrapperTensor base class; subclasses for FP8/MXFP8 with grouped_mm and linear overrides ao#3968
Add pad_token_groups_for_grouped_mm config option to use dynamic per group padding kernels for MXFP8 grouped mm in torchao, so we can delete padding code from torchtitan (context: Remove unnecessary token padding for MoE in BF16 mode #2255)
- torchao PR stack (must land first): [mxfp8 training] cuda kernel for unpadding token groups ao#4021

Tests

TODO: manually test this change prior to landing and update PR

danielvegamyhre · 2026-03-07T06:24:06Z

fyi @tianyu-l @rakkit this will unblock deleting the token group padding logic from torchtitan (for everything including mxfp8)

To clarify, the torchao _to_mxfp8_then_scaled_grouped_mm API still expects the tokens to be grouped by expert, rather than grouped by remote/source rank. it just no longer has alignment requirements, I've added kernels to pad inputs and unpad outputs accordingly.

So in Torchtitan, for _permute, the tokens go from:

[from rank0 for e0, from rank0 for e1, from rank1 for e0, from rank1 for e1]

To:

[from rank0 for e0, from rank1 for e0, from rank0 for e1, from rank1 for e1]

Then when torch._grouped_mm executes and dispatches to torchao, we pad the groups, and unpad the outputs.

tianyu-l

sg, one nit comment

tianyu-l · 2026-03-08T20:15:07Z

-
-    filter_fqns: list[str]
-    mx_config: Any  # MXLinearConfig type when imported
+class MXFP8Converter(Configurable):


Inherit QuantizationConverter

…torchao (pytorch#2520) ## Summary - Refactor MXFP8 model converter - Previous: 2 separate converters (linear, grouped_mm) - New: 1 unified converter for linear and grouped_mm ops - Details: pytorch/ao#3968 - Add `pad_token_groups_for_grouped_mm` config option to use dynamic per group padding kernels for MXFP8 grouped mm in torchao, so we can delete padding code from torchtitan (context: pytorch#2255) - torchao PR stack (must land first): pytorch/ao#4021 ## Tests - TODO: manually test this change prior to landing and update PR

danielvegamyhre requested review from fegin, tianyu-l, wconstab and wwwjn as code owners March 7, 2026 06:14

pytorch-bot Bot added the ciflow/8gpu label Mar 7, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 7, 2026

danielvegamyhre force-pushed the mar6 branch from d38afa6 to ccb739a Compare March 7, 2026 06:15

tianyu-l approved these changes Mar 8, 2026

View reviewed changes

tianyu-l requested a review from pianpwk March 8, 2026 20:16

danielvegamyhre force-pushed the mar6 branch from ccb739a to aa01755 Compare March 10, 2026 16:13

[mxfp8 training] token group padding for mxfp8 grouped mm in torchao

fa4bf8b

danielvegamyhre force-pushed the mar6 branch from aa01755 to fa4bf8b Compare March 10, 2026 16:52

tianyu-l merged commit 2b976ee into main Mar 10, 2026
27 of 32 checks passed

tianyu-l deleted the mar6 branch March 10, 2026 19:59

danielvegamyhre mentioned this pull request Apr 29, 2026

[Quantization] MXFP8LinearConverter should offer filter_fqns instead of fqns #3150

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mxfp8] refactor model converter; use token group padding kernels in torchao#2520

[mxfp8] refactor model converter; use token group padding kernels in torchao#2520
tianyu-l merged 1 commit into
mainfrom
mar6

danielvegamyhre commented Mar 7, 2026 •

edited

Loading

Uh oh!

danielvegamyhre commented Mar 7, 2026

Uh oh!

tianyu-l left a comment

Uh oh!

tianyu-l Mar 8, 2026

Uh oh!

danielvegamyhre Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielvegamyhre commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Uh oh!

danielvegamyhre commented Mar 7, 2026

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielvegamyhre commented Mar 7, 2026 •

edited

Loading