Fixing Pytorch RMS norm implementation by kkontny · Pull Request #133085 · pytorch/pytorch

kkontny · 2024-08-09T10:02:21Z

Since FP16 has quite small dynamic range it is very easy to overflow while computing at::pow(input, 2) , and it happens in real world computation.

I've tried to use nn.RMSNorm fused implementation instead of LlamaRMSNorm inside transformers implementation of Llama (src/transformers/models/llama/modeling_llama.py). It started to give wrong answers in Fp16 while still giving good in FP32. I figured out happens due to overflow while computing square of the input tensor.

Original LLamaRMSNorm implementation upcasts input to fp32 to prevent this and give better numerical stability.

class LlamaRMSNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        """
        LlamaRMSNorm is equivalent to T5LayerNorm
        """
        super().__init__()
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.variance_epsilon = eps

    def forward(self, hidden_states):
        input_dtype = hidden_states.dtype
        hidden_states = hidden_states.to(torch.float32)
        variance = hidden_states.pow(2).mean(-1, keepdim=True)
        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
        return self.weight * hidden_states.to(input_dtype)

Proposed commit fixed the issue. FP16 in RMSNorm has to be treated in special way, to be usable in real world implementations.

pytorch-bot · 2024-08-09T10:02:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133085

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6458021 with merge base b7bcfda ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-08-09T10:02:25Z

❌ - login: @kkontny / name: Karol Kontny . The commit (c79577d, 6458021) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

mikaylagawarecki

Could you sign CLA in #133085 (comment) please

kkontny · 2024-08-09T20:19:29Z

Could you sign CLA in #133085 (comment) please

I have to process the CLA through the company legal. Unfortunately it will be probably done early next week.

mayank31398 · 2024-08-21T00:14:12Z

@kkontny would you mind if I open this PR again instead?
Its blocking me as well, want to get it in as soon as possible

kkontny · 2024-08-21T07:46:57Z

@kkontny would you mind if I open this PR again instead? Its blocking me as well, want to get it in as soon as possible

@mayank31398 Please go on, I tried to hurry them, but with no effect. It takes much more time than it should...

mayank31398 · 2024-08-21T13:41:58Z

@kkontny I have created a PR here: #134106

@kkontny

This PR is a replacement for #133085 for pushing a quick fix for RMSNorm. The original author is @kkontny Previous PR summary: Since FP16 has quite small dynamic range it is very easy to overflow while computing `at::pow(input, 2)` , and it happens in real world computation. I've tried to use `nn.RMSNorm` fused implementation instead of `LlamaRMSNorm` inside `transformers` implementation of Llama (`src/transformers/models/llama/modeling_llama.py`). It started to give wrong answers in Fp16 while still giving good in FP32. I figured out happens due to overflow while computing square of the input tensor. Original `LLamaRMSNorm` implementation upcasts input to fp32 to prevent this and give better numerical stability. ``` class LlamaRMSNorm(nn.Module): def __init__(self, hidden_size, eps=1e-6): """ LlamaRMSNorm is equivalent to T5LayerNorm """ super().__init__() self.weight = nn.Parameter(torch.ones(hidden_size)) self.variance_epsilon = eps def forward(self, hidden_states): input_dtype = hidden_states.dtype hidden_states = hidden_states.to(torch.float32) variance = hidden_states.pow(2).mean(-1, keepdim=True) hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) return self.weight * hidden_states.to(input_dtype) ``` Proposed commit fixed the issue. FP16 in RMSNorm has to be treated in special way, to be usable in real world implementations. Pull Request resolved: #134106 Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy

mayank31398 · 2024-09-18T00:30:54Z

@kkontny I think we can close this.
this has been fixed now in #134106

kkontny · 2024-09-18T07:18:14Z

Fixed via #134106

@kkontny

This PR is a replacement for pytorch#133085 for pushing a quick fix for RMSNorm. The original author is @kkontny Previous PR summary: Since FP16 has quite small dynamic range it is very easy to overflow while computing `at::pow(input, 2)` , and it happens in real world computation. I've tried to use `nn.RMSNorm` fused implementation instead of `LlamaRMSNorm` inside `transformers` implementation of Llama (`src/transformers/models/llama/modeling_llama.py`). It started to give wrong answers in Fp16 while still giving good in FP32. I figured out happens due to overflow while computing square of the input tensor. Original `LLamaRMSNorm` implementation upcasts input to fp32 to prevent this and give better numerical stability. ``` class LlamaRMSNorm(nn.Module): def __init__(self, hidden_size, eps=1e-6): """ LlamaRMSNorm is equivalent to T5LayerNorm """ super().__init__() self.weight = nn.Parameter(torch.ones(hidden_size)) self.variance_epsilon = eps def forward(self, hidden_states): input_dtype = hidden_states.dtype hidden_states = hidden_states.to(torch.float32) variance = hidden_states.pow(2).mean(-1, keepdim=True) hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) return self.weight * hidden_states.to(input_dtype) ``` Proposed commit fixed the issue. FP16 in RMSNorm has to be treated in special way, to be usable in real world implementations. Pull Request resolved: pytorch#134106 Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy

Fixing Pytorch RMS norm implementation

c79577d

pytorchbot added the open source label Aug 9, 2024

Add big values to tests

6458021

kkontny requested a review from mruberry as a code owner August 9, 2024 12:27

albanD requested review from drisspg and mikaylagawarecki August 9, 2024 14:29

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 9, 2024

mikaylagawarecki reviewed Aug 9, 2024

View reviewed changes

mayank31398 mentioned this pull request Aug 21, 2024

fix for fp16 #134106

Closed

kkontny closed this Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Pytorch RMS norm implementation#133085

Fixing Pytorch RMS norm implementation#133085
kkontny wants to merge 2 commits intopytorch:mainfrom
kkontny:ampere/fix-rms-norm

kkontny commented Aug 9, 2024

Uh oh!

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Aug 9, 2024 •

edited

Loading

Uh oh!

mikaylagawarecki left a comment •

edited

Loading

Uh oh!

kkontny commented Aug 9, 2024

Uh oh!

mayank31398 commented Aug 21, 2024

Uh oh!

kkontny commented Aug 21, 2024

Uh oh!

mayank31398 commented Aug 21, 2024

Uh oh!

mayank31398 commented Sep 18, 2024

Uh oh!

kkontny commented Sep 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kkontny commented Aug 9, 2024

Uh oh!

pytorch-bot bot commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133085

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikaylagawarecki left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkontny commented Aug 9, 2024

Uh oh!

mayank31398 commented Aug 21, 2024

Uh oh!

kkontny commented Aug 21, 2024

Uh oh!

mayank31398 commented Aug 21, 2024

Uh oh!

mayank31398 commented Sep 18, 2024

Uh oh!

kkontny commented Sep 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading

linux-foundation-easycla bot commented Aug 9, 2024 •

edited

Loading

mikaylagawarecki left a comment •

edited

Loading