[release/2.7] Enable mx fp8 support on ROCm by jagadish-amd · Pull Request #2199 · ROCm/pytorch

jagadish-amd · 2025-05-28T06:56:18Z

Ported mx fp8 part from #2046

Current test stats (accounting only blockwise scale tests)
PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v

Ran 225 tests in 8.256s
FAILED (failures=1, skipped=150)
74 test pass

fp8 mx data type sample test case.
test_blockwise_mxfp8_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_cuda (main.TestFP8MatmulCudaCUDA)
hipblaslt-bench --api_method c -m 256 -n 512 -k 128 --lda 128 --ldb 128 --ldc 256 --ldd 256 --stride_a 0 --stride_b 0 --stride_c 0 --stride_d 0 --alpha 1 --beta 0 --transA T --transB N --batch_count 1 --scaleA 3 --scaleB 3 --a_type f8_r --b_type f8_r --c_type bf16_r --d_type bf16_r --compute_type f32_r --algo_method index --solution_index -2146957310 --rotating 0 --cold_iters 0 --iters 0

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

…l_2.7_mx_f8

jagadish-amd · 2025-05-28T07:40:25Z

rel/2.7 does not support mx fp4. Hence added ROCm support to enable mx fp8.
cc @pruthvistony

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

Ported mx fp8 part from #2046 Current test stats (accounting only blockwise scale tests) PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 225 tests in 8.256s FAILED (failures=1, skipped=150) _74 test pass_ **fp8 mx data type sample test case.** test_blockwise_mxfp8_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_cuda (__main__.TestFP8MatmulCudaCUDA) hipblaslt-bench --api_method c -m 256 -n 512 -k 128 --lda 128 --ldb 128 --ldc 256 --ldd 256 --stride_a 0 --stride_b 0 --stride_c 0 --stride_d 0 --alpha 1 --beta 0 --transA T --transB N --batch_count 1 --scaleA 3 --scaleB 3 --a_type f8_r --b_type f8_r --c_type bf16_r --d_type bf16_r --compute_type f32_r --algo_method index --solution_index -2146957310 --rotating 0 --cold_iters 0 --iters 0 --------- Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> (cherry picked from commit d17e222)

Ported mx fp8 part from ROCm#2046 Current test stats (accounting only blockwise scale tests) PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 225 tests in 8.256s FAILED (failures=1, skipped=150) _74 test pass_ **fp8 mx data type sample test case.** test_blockwise_mxfp8_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_cuda (__main__.TestFP8MatmulCudaCUDA) hipblaslt-bench --api_method c -m 256 -n 512 -k 128 --lda 128 --ldb 128 --ldc 256 --ldd 256 --stride_a 0 --stride_b 0 --stride_c 0 --stride_d 0 --alpha 1 --beta 0 --transA T --transB N --batch_count 1 --scaleA 3 --scaleB 3 --a_type f8_r --b_type f8_r --c_type bf16_r --d_type bf16_r --compute_type f32_r --algo_method index --solution_index -2146957310 --rotating 0 --cold_iters 0 --iters 0 --------- Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> (cherry picked from commit d17e222)

Ported mx fp8 part from #2046 Current test stats (accounting only blockwise scale tests) PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 225 tests in 8.256s FAILED (failures=1, skipped=150) _74 test pass_ **fp8 mx data type sample test case.** test_blockwise_mxfp8_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_cuda (__main__.TestFP8MatmulCudaCUDA) hipblaslt-bench --api_method c -m 256 -n 512 -k 128 --lda 128 --ldb 128 --ldc 256 --ldd 256 --stride_a 0 --stride_b 0 --stride_c 0 --stride_d 0 --alpha 1 --beta 0 --transA T --transB N --batch_count 1 --scaleA 3 --scaleB 3 --a_type f8_r --b_type f8_r --c_type bf16_r --d_type bf16_r --compute_type f32_r --algo_method index --solution_index -2146957310 --rotating 0 --cold_iters 0 --iters 0 --------- Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> (cherry picked from commit d17e222)

Enable mx f8 support on ROCm

b340785

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

jagadish-amd requested review from jeffdaily and jithunnair-amd as code owners May 28, 2025 06:56

Merge branch 'release/2.7' of https://github.com/ROCm/pytorch into re…

bbfe30c

…l_2.7_mx_f8

jagadish-amd requested review from petrex and pruthvistony May 28, 2025 07:40

jagadish-amd changed the title ~~Enable mx f8 support on ROCm~~ Enable mx fp8 support on ROCm May 28, 2025

jagadish-amd changed the title ~~Enable mx fp8 support on ROCm~~ [release/2.7] Enable mx fp8 support on ROCm May 30, 2025

jagadish-amd added 2 commits May 30, 2025 16:40

Blas.cpp: change not required

78ae588

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

Change ROCm check from 6.5 to 7.0

f121779

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

pruthvistony approved these changes Jun 4, 2025

View reviewed changes

pruthvistony merged commit d17e222 into release/2.7 Jun 4, 2025

pruthvistony deleted the rel_2.7_mx_f8 branch June 4, 2025 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/2.7] Enable mx fp8 support on ROCm#2199

[release/2.7] Enable mx fp8 support on ROCm#2199
pruthvistony merged 4 commits intorelease/2.7from
rel_2.7_mx_f8

jagadish-amd commented May 28, 2025 •

edited

Loading

Uh oh!

jagadish-amd commented May 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jagadish-amd commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jagadish-amd commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jagadish-amd commented May 28, 2025 •

edited

Loading

jagadish-amd commented May 28, 2025 •

edited

Loading