[mxpf8] Make mxfp8 dim1 cast kernel configurable by danielvegamyhre · Pull Request #1401 · pytorch/torchtitan

danielvegamyhre · 2025-07-15T21:58:00Z

Summary

We recently developed a CUDA kernel in torchao to perform mxfp8 casting with scaling along dim1, which is ~1.4x faster than the previous Triton implementation, this results in e2e training speedup of 1.5% - 2.5% with torchtitan Llama3 8b with FSDP=4/8: Add CUDA kernel for MXFP8 dim1 casting ao#2513
The integration into torchao is finished (integration of new mxfp8 casting cuda kernel ao#2564), so we need to update torchtitan to make the kernel choice for mxfp8 dim1 cast configurable to "triton", "cuda", or "torch".

Test plan

Triton: NGPU=8 CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --training.steps=100 --model.converters="mx" --mx.recipe_name="mxfp8" --training.compile --mx.mxfp8_dim1_cast_kernel_choice="triton"
Cuda: NGPU=8 CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --training.steps=100 --model.converters="mx" --mx.recipe_name="mxfp8" --training.compile --mx.mxfp8_dim1_cast_kernel_choice="cuda"

danielvegamyhre · 2025-07-15T22:00:46Z

cc @tianyu-l @vkuzo

danielvegamyhre requested review from fegin, tianyu-l, wconstab and wwwjn as code owners July 15, 2025 21:58

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 15, 2025

danielvegamyhre marked this pull request as draft July 15, 2025 21:58

danielvegamyhre mentioned this pull request Jul 15, 2025

integrate mxfp8 dim1 cast kernel choice enum into MXLinear pytorch/ao#2554

Closed

danielvegamyhre force-pushed the mxcuda branch from d420d93 to 8bf1a65 Compare July 15, 2025 22:00

make mxfp8 dim1 cast kernel configurable

affe1a8

danielvegamyhre force-pushed the mxcuda branch from 8bf1a65 to affe1a8 Compare July 16, 2025 05:41

update api name

5e84ec7

danielvegamyhre mentioned this pull request Jul 16, 2025

integration of new mxfp8 casting cuda kernel pytorch/ao#2564

Merged

danielvegamyhre marked this pull request as ready for review July 18, 2025 15:37

danielvegamyhre closed this Jul 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mxpf8] Make mxfp8 dim1 cast kernel configurable#1401

[mxpf8] Make mxfp8 dim1 cast kernel configurable#1401
danielvegamyhre wants to merge 2 commits into
mainfrom
mxcuda

danielvegamyhre commented Jul 15, 2025 •

edited

Loading

Uh oh!

danielvegamyhre commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielvegamyhre commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

danielvegamyhre commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielvegamyhre commented Jul 15, 2025 •

edited

Loading