fix gemma3 grad acc by SunMarc · Pull Request #37208 · huggingface/transformers

SunMarc · 2025-04-02T13:41:27Z

What does this PR do?

This PR fixes the grad acc issue with gemma3 model. The issue was that we passed **kwargs in the model forward, so we were making the assumption that he was passing **loss_kwargs -> num_items_in_batch to calculate the loss. Not sure what is the best way to fix this @ArthurZucker in general as this might probably happen again. Maybe set accepts_loss_kwargs to False in general and set it to True for models that we fixed ? I'm fine also just setting it False for models that don't use the kwargs for the loss.

As for why I didn't have the loss function: In the code, they are filetring the logits/labels so I decided to simply not use num_items_in_batch to calculate the loss. Otherwise, the loss won't be correctly calculated for one of the cases.

Also I fixed an issue related to peft as we couldn't have access to that attribute as the model was a peft model.

To reproduce

winglian script
https://gist.github.com/winglian/569924fe154824c8ce148f6e185cd4cd

After fix

grad acc 2 bs 1 and grad acc 1 bs 2

Fixes #37197

cc @winglian

github-actions · 2025-04-02T13:41:40Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

…gemma3-grad-acc

HuggingFaceDocBuilderDev · 2025-04-02T14:21:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…gemma3-grad-acc

muellerzr

Nice job! Looks good to me, is get_base_model an old enough api we don’t need to worry about breaking stuff?

SunMarc · 2025-04-07T12:52:42Z

Nice job! Looks good to me, is get_base_model an old enough api we don’t need to worry about breaking stuff?

Should be. we are already using it in the past here:

 model_forward = (
            unwrapped_model.forward
            if not _is_peft_model(unwrapped_model)
            else unwrapped_model.get_base_model().forward
        )

SunMarc · 2025-04-07T12:54:03Z

cc @ArthurZucker gentle ping

SunMarc · 2025-05-06T15:52:51Z

ping @ArthurZucker

ArthurZucker

Thanks let's make it go green and merge

ArthurZucker · 2025-06-24T13:31:34Z

@bot /style

ArthurZucker

oups sorry

setup.py

ArthurZucker · 2025-06-24T14:19:26Z

@bot /style

github-actions · 2025-06-24T14:21:00Z

Style fixes have been applied. View the workflow run here.

fix gemma3 grad acc

f5f635f

github-actions bot marked this pull request as draft April 2, 2025 13:41

fix

26f261d

SunMarc marked this pull request as ready for review April 2, 2025 13:42

Merge branch 'main' into fix-gemma3-grad-acc

8b84839

github-actions bot requested review from ArthurZucker and Rocketknight1 April 2, 2025 13:42

SunMarc mentioned this pull request Apr 2, 2025

Gemma3 Gradient Accumulation loss #37197

Closed

4 tasks

SunMarc added 3 commits April 2, 2025 15:54

fix

228d4fa

Merge remote-tracking branch 'upstream/fix-gemma3-grad-acc' into fix-…

a5c7cb5

…gemma3-grad-acc

fix

331ceaf

SunMarc and others added 3 commits April 2, 2025 16:32

fix

9aababf

Merge branch 'main' into fix-gemma3-grad-acc

0546cb2

rmv print

945be74

SunMarc requested a review from muellerzr April 2, 2025 14:43

SunMarc added 2 commits April 2, 2025 16:44

Merge remote-tracking branch 'upstream/fix-gemma3-grad-acc' into fix-…

e51c6fd

…gemma3-grad-acc

rm

52811d4

muellerzr approved these changes Apr 3, 2025

View reviewed changes

SunMarc requested review from ArthurZucker and removed request for ArthurZucker April 10, 2025 09:13

ArthurZucker approved these changes Jun 24, 2025

View reviewed changes

Merge branch 'main' into fix-gemma3-grad-acc

5aa30df

ArthurZucker reviewed Jun 24, 2025

View reviewed changes

setup.py Show resolved Hide resolved

Update setup.py

16799c9

Apply style fixes

d277ca8

propagate the changes

246d28f

ArthurZucker merged commit 3c322c9 into main Jun 25, 2025
21 checks passed

ArthurZucker deleted the fix-gemma3-grad-acc branch June 25, 2025 14:28

Tcc0403 mentioned this pull request Jul 10, 2025

Loss scaling is incorrect when using gradient_accumulation_steps > 1 linkedin/Liger-Kernel#802

Closed

iMountTai mentioned this pull request Aug 21, 2025

fix qwen25-vl grad acc #40333

Merged

SunMarc mentioned this pull request Sep 1, 2025

Loss Scales Incorrectly with Gradient Accumulation Steps in Trainer (Gemma3 and other models) #40564

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gemma3 grad acc#37208

fix gemma3 grad acc#37208
ArthurZucker merged 15 commits intomainfrom
fix-gemma3-grad-acc

SunMarc commented Apr 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2025

Uh oh!

muellerzr left a comment

Uh oh!

SunMarc commented Apr 7, 2025

Uh oh!

SunMarc commented Apr 7, 2025

Uh oh!

SunMarc commented May 6, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker commented Jun 24, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

ArthurZucker commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SunMarc commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

To reproduce

After fix

Uh oh!

github-actions bot commented Apr 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2025

Uh oh!

muellerzr left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Apr 7, 2025

Uh oh!

SunMarc commented Apr 7, 2025

Uh oh!

SunMarc commented May 6, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Jun 24, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SunMarc commented Apr 2, 2025 •

edited

Loading