Fix RecurrentGemma device_map by SunMarc · Pull Request #30273 · huggingface/transformers

SunMarc · 2024-04-16T14:48:53Z

What does this PR do ?

This PR makes gemma compatible with multi-gpu device_map. To try out:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/recurrentgemma-2b-it", device_map="auto"
)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,use_cache=True)
print(tokenizer.decode(outputs[0]))

I get the same output in the single gpu or multi gpu setup.

SunMarc · 2024-04-16T14:53:18Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

            indices = (slicing + to_shift[-1].int() - 1) % self.config.attention_window_size

-            k_out, v_out = self.key_states, self.value_states
+            k_out, v_out = self.key_states.to(key_states.device), self.value_states.to(value_states.device)


Due to _setup_cache, self.key_states and self.value_states are initialized on the device of the hidden state that we pass to the model in generate (e.g. cuda:0). However, this layer might not be on the same device as the hidden state if we use multi-gpu. Hence, we need to make sure that self.key_states is on the same device as key_states. Same for value_states.

SunMarc · 2024-04-16T14:54:10Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

+                contextualized_states = recurrent_gate.type(acc_dtype) * recurrent_states[:, None].to(
+                    recurrent_gate.device
+                )


Same issue with recurrent_gate which is initialized in _setup_cache.

SunMarc · 2024-04-16T14:54:28Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

            contextualized_states = torch.zeros_like(hidden_states)
            for t in range(hidden_states.shape[1]):
-                recurrent_states = recurrent_gate[:, t].type(acc_dtype) * recurrent_states
+                recurrent_states = recurrent_gate[:, t].type(acc_dtype) * recurrent_states.to(recurrent_gate.device)


Here also !

SunMarc · 2024-04-16T14:55:04Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

+        self.register_buffer(
+            "normalizer", torch.tensor(self.config.hidden_size**0.5, dtype=torch.bfloat16), persistent=False
+        )


We don't need this to be persistant. This fixes an issue that we get with accelerate too.

HuggingFaceDocBuilderDev · 2024-04-16T15:12:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks, the device thing could be fixed by placing them on the same device as self.key_states? rather than the device passed?
Also tad bit scared of the slow down of doing it there? But LGTM otherwise

ArthurZucker · 2024-04-16T19:23:28Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

+        self.register_buffer(
+            "normalizer", torch.tensor(self.config.hidden_size**0.5, dtype=torch.bfloat16), persistent=False
+        )


SunMarc · 2024-04-17T12:01:23Z

Thanks, the device thing could be fixed by placing them on the same device as self.key_states? rather than the device passed? Also tad bit scared of the slow down of doing it there? But LGTM otherwise

I think it will slow down if why place them on the same device as self.key_states for example. Let's say self.key_states is initialized on cuda:0 and we have 2 gpus. The problem is that the computed key_states can be on cuda:0 or cuda:1 depending on where the layer is. Hence, it is better to move self.key_states to the device of key_states to limit data transfert between gpus. Otherwise, we will need to move the data each time we have a layer in cuda:1.

ArthurZucker

Thanks

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

SunMarc added 3 commits April 16, 2024 15:32

Switch to non persistant buffer

7d0595d

fix device mismatch issue due to cache

b95c44e

style

3b6d19c

SunMarc requested a review from ArthurZucker April 16, 2024 14:48

SunMarc changed the title ~~Fix recurrent gemma device_map~~ Fix RecurrentGemma device_map Apr 16, 2024

SunMarc commented Apr 16, 2024

View reviewed changes

ArthurZucker reviewed Apr 16, 2024

View reviewed changes

ArthurZucker approved these changes Apr 18, 2024

View reviewed changes

ArthurZucker merged commit 7509a0a into huggingface:main Apr 18, 2024

ydshieh pushed a commit that referenced this pull request Apr 23, 2024

Fix RecurrentGemma device_map (#30273)

005b9ec

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RecurrentGemma device_map#30273

Fix RecurrentGemma device_map#30273
ArthurZucker merged 3 commits intohuggingface:mainfrom
SunMarc:fix-buffer-recurrent-gemma

SunMarc commented Apr 16, 2024

Uh oh!

SunMarc Apr 16, 2024 •

edited

Loading

Uh oh!

SunMarc Apr 16, 2024

Uh oh!

SunMarc Apr 16, 2024

Uh oh!

SunMarc Apr 16, 2024

Uh oh!

ArthurZucker Apr 16, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Apr 16, 2024

Uh oh!

SunMarc commented Apr 17, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SunMarc commented Apr 16, 2024

What does this PR do ?

Uh oh!

SunMarc Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Apr 16, 2024

Choose a reason for hiding this comment

Uh oh!

SunMarc Apr 16, 2024

Choose a reason for hiding this comment

Uh oh!

SunMarc Apr 16, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 16, 2024

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 16, 2024

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Apr 17, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SunMarc Apr 16, 2024 •

edited

Loading