Add GraniteRMSNorm by NielsRogge · Pull Request #33177 · huggingface/transformers

NielsRogge · 2024-08-28T19:02:24Z

What does this PR do?

This PR is a follow-up of #31502 which broke Transformers for PyTorch < 2.4.

HuggingFaceDocBuilderDev · 2024-08-28T19:21:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

mayank31398 · 2024-08-28T20:07:40Z

Hi this makes sense.
I had added the nn.RMSNorm class
but looks like there is some issue with fp16 as well: pytorch/pytorch#134106

ArthurZucker

thanks

NielsRogge · 2024-08-29T14:46:14Z

Should I run slow tests or can this be merged as-is?

mayank31398 · 2024-08-29T15:02:29Z

might be better to run slow tests on the granite class @NielsRogge

NielsRogge · 2024-08-30T10:30:16Z

Seems like the slow tests are failing (cc @ydshieh), but I assume it's safe to merge this PR since the following passes

from torch import nn
import torch


class GraniteRMSNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        """
        GraniteRMSNorm is equivalent to T5LayerNorm
        """
        super().__init__()
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.variance_epsilon = eps

    def forward(self, hidden_states):
        input_dtype = hidden_states.dtype
        hidden_states = hidden_states.to(torch.float32)
        variance = hidden_states.pow(2).mean(-1, keepdim=True)
        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
        return self.weight * hidden_states.to(input_dtype)

    def extra_repr(self):
        return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
    

a = torch.nn.RMSNorm(10)

b = GraniteRMSNorm(10)

assert a.weight.shape == b.weight.shape

c = torch.randn(1, 10)

assert torch.allclose(a(c), b(c))

iamsaurabhgupt · 2024-08-30T19:24:01Z

when will this release to fix transformers?
creating lot of conflicts as pytorch==2.4 is not supported by many packages like vllm, flash_attn, etc.

mayank31398 · 2024-08-30T20:32:16Z

@ArthurZucker can you merge this?
I will create a new PR for fixing the slow tests.
Looks like its blocking the DeepSpeed team.

* first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * fix * fix * fix * move rope? * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * fix-copies * torch rmsnorm * add authors * change model path * fix * test * drop static cache test * uupdate readme * drop non-causal * readme * drop useless imports * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker · 2024-09-02T07:39:33Z

Okay

ArthurZucker · 2024-09-02T07:40:11Z

cc @ydshieh if you see failures on this, it's expected!

mayank31398 · 2024-09-02T20:14:20Z

Thanks @ArthurZucker , Ill fix this test in a new PR

* Add GraniteRMSNorm * [run_slow] granite

Add GraniteRMSNorm

5726650

NielsRogge mentioned this pull request Aug 28, 2024

Granite language models #31502

Merged

NielsRogge mentioned this pull request Aug 28, 2024

Errors with nn.RMSNorm in DeepSpeed #33176

Closed

4 tasks

NielsRogge requested a review from ArthurZucker August 28, 2024 19:39

ArthurZucker approved these changes Aug 29, 2024

View reviewed changes

vasqu mentioned this pull request Aug 29, 2024

PR #31502 bumps up PyTorch requirement to >=2.4 #33197

Closed

4 tasks

loadams mentioned this pull request Aug 29, 2024

Fix transformers/torch errors on nn.RMSNorm by pinning transformers. deepspeedai/DeepSpeed#6458

Closed

jinze1994 mentioned this pull request Aug 30, 2024

module 'torch.nn' has no attribute 'RMSNorm' QwenLM/Qwen3-VL#12

Closed

NielsRogge added the run-slow label Aug 30, 2024

[run_slow] granite

3b57eff

This was referenced Aug 31, 2024

AWQ performance on RTX3090, with flash_attn2 QwenLM/Qwen3-VL#8

Closed

能支持其他Pytorch版本吗？ QwenLM/Qwen3-VL#52

Closed

ArthurZucker merged commit b9bc691 into huggingface:main Sep 2, 2024

kq-chen mentioned this pull request Sep 2, 2024

Failed to import transformers.models.qwen2_vl.modeling_qwen2_vl QwenLM/Qwen3-VL#68

Closed

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Add GraniteRMSNorm (huggingface#33177)

74fcc2b

* Add GraniteRMSNorm * [run_slow] granite

pablomlago mentioned this pull request Oct 10, 2025

Setup: fix requirements for HF packages Xilinx/brevitas#1386

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GraniteRMSNorm#33177

Add GraniteRMSNorm#33177
ArthurZucker merged 2 commits intohuggingface:mainfrom
NielsRogge:patch_granite

NielsRogge commented Aug 28, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Aug 28, 2024

Uh oh!

mayank31398 commented Aug 28, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

NielsRogge commented Aug 29, 2024 •

edited

Loading

Uh oh!

mayank31398 commented Aug 29, 2024

Uh oh!

NielsRogge commented Aug 30, 2024

Uh oh!

iamsaurabhgupt commented Aug 30, 2024

Uh oh!

mayank31398 commented Aug 30, 2024

Uh oh!

ArthurZucker commented Sep 2, 2024

Uh oh!

ArthurZucker commented Sep 2, 2024

Uh oh!

mayank31398 commented Sep 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

NielsRogge commented Aug 28, 2024

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 28, 2024

Uh oh!

mayank31398 commented Aug 28, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

NielsRogge commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayank31398 commented Aug 29, 2024

Uh oh!

NielsRogge commented Aug 30, 2024

Uh oh!

iamsaurabhgupt commented Aug 30, 2024

Uh oh!

mayank31398 commented Aug 30, 2024

Uh oh!

ArthurZucker commented Sep 2, 2024

Uh oh!

ArthurZucker commented Sep 2, 2024

Uh oh!

mayank31398 commented Sep 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NielsRogge commented Aug 29, 2024 •

edited

Loading