Skip to content

[mxfp8] fix exp2f_rcp to handle special case of unbiased exponent of 127#4117

Merged
danielvegamyhre merged 3 commits into
mainfrom
numericfix
Mar 21, 2026
Merged

[mxfp8] fix exp2f_rcp to handle special case of unbiased exponent of 127#4117
danielvegamyhre merged 3 commits into
mainfrom
numericfix

Conversation

@danielvegamyhre

@danielvegamyhre danielvegamyhre commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Copying updated exp2f_rcp from here to properly handle case of unbiased exponent of 127.

Build test

  • USE_CPP=1 pip install --no-build-isolation -e ~/ao -v

@pytorch-bot

pytorch-bot Bot commented Mar 19, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4117

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit a1b96e4 with merge base eb64bfb (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026
@danielvegamyhre danielvegamyhre added mx module: training quantize_ api training flow module: core changes affecting multiple modules, e.g. base config/tensor, observers, quant ops and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: training quantize_ api training flow labels Mar 19, 2026

@slayton58 slayton58 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vkuzo

vkuzo commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

can we have a test?

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026
@danielvegamyhre

danielvegamyhre commented Mar 19, 2026

Copy link
Copy Markdown
Contributor Author

can we have a test?

@vkuzo @slayton58 correct me if i'm wrong but i don't think this bug is reachable from our pytorch code? to reach a biased scale of 254 we need a very large value with exponent bits of 254, and we only can pass bf16 or fp32 (same dynamic range, so let's use fp32).

the max fp32 value is (2 - (2**-23)) * (2**127) which has exponent 254. However, when using RCEIL scaling (only path that uses exp2f_rcp) this is divided by 448 and converted with the cvt.sat.finite inline ptx and this results in scale.view(uint8) of 247, which is below the target value of 254 to hit this code path.

tl;dr We can't create large enough test inputs for the kernel to hit this path since they aren't representable in fp32 or bf16.

@danielvegamyhre

danielvegamyhre commented Mar 19, 2026

Copy link
Copy Markdown
Contributor Author

@vkuzo @slayton58 ok i determined we can pass "inf" which will result in biased exponent of 254 and returns scales.view(uint8) == 254. So i guess this case isn't reachable via finite values but is special case for infinity?

@slayton58

Copy link
Copy Markdown

@vkuzo @slayton58 ok i determined we can pass "inf" which will result in biased exponent of 254 and returns scales.view(uint8) == 254. So i guess this case isn't reachable via finite values but is special case for infinity?

Sounds about right - even looking at MXFP4 we'd still only hit 252, which is still < 254. I think we'd need a format where max(fmt) == 1 to hit this, which we don't have.

@danielvegamyhre danielvegamyhre merged commit b45b6bf into main Mar 21, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: core changes affecting multiple modules, e.g. base config/tensor, observers, quant ops mx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants