Fix model kwargs by muellerzr · Pull Request #35875 · huggingface/transformers

muellerzr · 2025-01-24T13:08:25Z

What does this PR do?

Adds unused **kwargs to particular models so that num_items_in_batch can work as intended

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

HuggingFaceDocBuilderDev · 2025-01-24T14:40:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-01-24T14:43:57Z

test_modeling_names.txt

ArthurZucker · 2025-01-24T14:44:30Z

src/transformers/models/mpt/modeling_mpt.py

+            loss = self.loss_function(
+                shift_logits.view(batch_size * seq_length, vocab_size),
+                shift_labels.view(batch_size * seq_length),
+                vocab_size=vocab_size,
+                **kwargs,
            )


bit weird, the refactor here should make you only have to pass inputs and the shifts will happen inside

Bachstelze · 2025-01-29T15:40:09Z

Is it normal that some checks were not successful?

ArthurZucker

Nice thanks!

ArthurZucker · 2025-01-30T14:20:14Z

test_modeling_names.txt

muellerzr · 2025-02-05T18:35:58Z

src/transformers/modeling_utils.py

+        if hasattr(self, "_loss_function"):
+            return self._loss_function
+


@ArthurZucker this was needed to be added for a few models that don't need everything the loss func was up to. Case was xglm

muellerzr · 2025-02-05T21:41:12Z

Finally ready to go, sorry it took me a bit, lots of models to triple check 😓

ArthurZucker

Taking this comment into account: #34191 (comment)
cc @bauwenst the getter and setter for self._loss_function should be of help!
I need to review but I think it does help to be able to set self._loss_function for sure. Now the questions is whether or not we want to explicitly do it in our of our models or not!

ArthurZucker

thanks sir! 🫡

src/transformers/models/gpt_neox/modeling_gpt_neox.py

src/transformers/models/xglm/modeling_xglm.py

* Save state * Make a failing test * Better test * mpt -> done, many more to go * Rm extranious * Bamba * Bert * big_bird * biogpt * bloom * codegen * ctrl * data2vec * dbrx * Through up to Dbrx * electra * ernie * falcon * Fuyu/persimmon * Include noop kwargs to base models * Rebase * Skip musigen * Refactor/skip mllama * Revert makefile * Rm file * Fix PT failing, need to modify rest of loss funcs to not resize * Propagate some * Continue * More * More options * Mostly fixed * Proved that it's the same * Bloom is good * Make ability to override loss func possible * Fixup * Clean * Fix xglm * Quality tests * Skip OCR2 * Make specific loss for xglm * Make order the same/line up 1:1 * xglm * Skip fx output loss bloom model * Didn't pass in pad_token_id * Fix quality

eljandoubi · 2025-02-07T19:29:57Z

This is not working for DonutSwinModel

muellerzr · 2025-02-07T19:48:27Z

Thanks @eljandoubi, to be more specific it's encoder/decoder models

* Save state * Make a failing test * Better test * mpt -> done, many more to go * Rm extranious * Bamba * Bert * big_bird * biogpt * bloom * codegen * ctrl * data2vec * dbrx * Through up to Dbrx * electra * ernie * falcon * Fuyu/persimmon * Include noop kwargs to base models * Rebase * Skip musigen * Refactor/skip mllama * Revert makefile * Rm file * Fix PT failing, need to modify rest of loss funcs to not resize * Propagate some * Continue * More * More options * Mostly fixed * Proved that it's the same * Bloom is good * Make ability to override loss func possible * Fixup * Clean * Fix xglm * Quality tests * Skip OCR2 * Make specific loss for xglm * Make order the same/line up 1:1 * xglm * Skip fx output loss bloom model * Didn't pass in pad_token_id * Fix quality

zheka77111 · 2025-02-25T18:52:40Z

I got this error on my AutoModelForSequenceClassification:
File ~/Documents/ML/.venv/lib/python3.13/site-packages/transformers/trainer.py:3698, in Trainer.training_step(self, model, inputs, num_items_in_batch)
3695 return loss_mb.reduce_mean().detach().to(self.args.device)
3697 with self.compute_loss_context_manager():
-> 3698 loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
3700 del inputs
3701 if (
3702 self.args.torch_empty_cache_steps is not None
3703 and self.state.global_step % self.args.torch_empty_cache_steps == 0
3704 ):

TypeError: WiegthedTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'

muellerzr requested a review from ArthurZucker January 24, 2025 13:08

muellerzr mentioned this pull request Jan 24, 2025

Handle num_items_in_batch in Mistral's forward #34576

Open

ArthurZucker reviewed Jan 24, 2025

View reviewed changes

ArthurZucker approved these changes Jan 30, 2025

View reviewed changes

test_modeling_names.txt Outdated

Copy link

Collaborator

ArthurZucker Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to delete!

muellerzr reacted with thumbs up emoji

ArthurZucker marked this pull request as ready for review January 30, 2025 14:20

muellerzr commented Feb 5, 2025

View reviewed changes

muellerzr force-pushed the muellerzr-fix-model-kwargs branch from cbddbc9 to 6b380f4 Compare February 5, 2025 18:49

ArthurZucker reviewed Feb 6, 2025

View reviewed changes

faaany mentioned this pull request Feb 6, 2025

[docs] fix outdated example code in trainer.md #36066

Merged

ArthurZucker approved these changes Feb 6, 2025

View reviewed changes

src/transformers/models/gpt_neox/modeling_gpt_neox.py Outdated Show resolved Hide resolved

src/transformers/models/xglm/modeling_xglm.py Outdated Show resolved Hide resolved

muellerzr added 17 commits February 6, 2025 10:45

Save state

d3c618e

Make a failing test

c489527

Better test

8a58190

mpt -> done, many more to go

4348e36

Rm extranious

3b3dfd2

Bamba

2bf5390

Bert

34f9060

big_bird

3960502

biogpt

a87ed15

bloom

2705ae6

codegen

33e718b

ctrl

e215848

data2vec

72459fa

dbrx

212ee51

Through up to Dbrx

8159793

electra

f5cf781

ernie

96e26f6

muellerzr added 15 commits February 6, 2025 10:45

More

978dbbe

More options

ea4484e

Mostly fixed

12627ef

Proved that it's the same

dc42e65

Bloom is good

9f23ae7

Make ability to override loss func possible

12c00f6

Fixup

b6fb606

Clean

cfb3bcf

Fix xglm

f7eda3b

Quality tests

6d34419

Skip OCR2

c103851

Make specific loss for xglm

bde0bef

Make order the same/line up 1:1

2f951dd

xglm

5204b53

Skip fx output loss bloom model

038dc55

muellerzr force-pushed the muellerzr-fix-model-kwargs branch from d93121a to 038dc55 Compare February 6, 2025 15:45

muellerzr added 2 commits February 6, 2025 10:53

Didn't pass in pad_token_id

6033db8

Fix quality

ff06a1d

muellerzr merged commit 28f73bc into main Feb 6, 2025
26 checks passed

muellerzr deleted the muellerzr-fix-model-kwargs branch February 6, 2025 16:35

yoadsn mentioned this pull request Feb 10, 2025

Adapting Whisper to the new loss_function attribute #36119

Open

ArthurZucker mentioned this pull request Feb 11, 2025

Fix: bamba error handling kwargs with forward pass #35378

Closed

damianoamatruda mentioned this pull request Feb 14, 2025

Fix XGLM loss computation (PyTorch and TensorFlow) #35878

Merged

		if hasattr(self, "_loss_function"):
		return self._loss_function

Conversation

muellerzr commented Jan 24, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 24, 2025

Uh oh!

ArthurZucker Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bachstelze commented Jan 29, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

muellerzr Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

muellerzr commented Feb 5, 2025

Uh oh!

ArthurZucker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eljandoubi commented Feb 7, 2025

Uh oh!

muellerzr commented Feb 7, 2025

Uh oh!

zheka77111 commented Feb 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ArthurZucker Jan 24, 2025 •

edited

Loading

ArthurZucker left a comment •

edited

Loading