Idefics: fix position ids by zucchini-nlp · Pull Request #33907 · huggingface/transformers

zucchini-nlp · 2024-10-03T10:17:01Z

What does this PR do?

Fixes #33852 by cropping position ids to the length of inputs and cleans up a little bit the way inputs are prepared for generation

TF equivalence tests are failing for me locally and also failing on main, so those are not related to this PR

HuggingFaceDocBuilderDev · 2024-10-03T10:40:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks, there might be a test worth adding for this no? (we break that often for peft 😄 )

ArthurZucker · 2024-10-04T10:15:36Z

src/transformers/models/idefics/modeling_idefics.py

            # create position_ids on the fly for batch generation
            position_ids = attention_mask.long().cumsum(-1) - 1
            position_ids.masked_fill_(attention_mask == 0, 1)
+            position_ids = position_ids[:, -seq_length:]


The attention mask is of shape seq_length_with_past, so that makes sense!

Kamichanw · 2024-10-06T03:48:52Z

When this branch can be merged?

zucchini-nlp · 2024-10-06T21:01:23Z

Will take a look at your comment and merge after seeing what is wrong with labels. TBH. labels should not be affected by position ids, but I'll dig into

zucchini-nlp · 2024-10-07T07:50:55Z

@ArthurZucker can you take one more look pls? I added fix for labels. Seems one of the very first VLMs introduced it and all other simply copied from that one. I don't see any attn mask applied in LLMs and I don't think we should be doing it under the hood, without the user knowing. So I deprecated that for 2 minor releases

BenjaminBossan · 2024-10-07T11:23:29Z

Thanks, there might be a test worth adding for this no? (we break that often for peft 😄 )

Maybe PEFT would be a better place to add this test. If so, LMK and I can work on something (probably based on this code). Since that would only test a single architecture, it wouldn't be super useful IMO but still better than nothing.

ArthurZucker

Just missing info on the shift, we broke this recently (I think we never had this) would be in favor of doing less work for the users, but properly documenting !

ArthurZucker · 2024-10-10T12:54:42Z

src/transformers/models/idefics2/modeling_idefics2.py

+                # we use the input attention mask to shift the logits and labels, because it is 2D.
+                # we also crop attn mask in case it is longer, which happens in PrefixTuning with peft
+                shift_attention_mask = attention_mask[:, -(logits.shape[1] - 1) :].to(logits.device)
+                logger.warning_once(
+                    "The final logits were masked using attention mask before calculating the loss. "
+                    "This behavior will be removed in v4.48, you should be masking the `labels` with `-100` "
+                    "in data collator before training starts."
+                )


that's the only thing I am curious about, we are introducing something new that we know we are gonna deprecate. Not sure this makes a lot of sense to me! Would either not add this, or just not progagate to other models than idefics

Yes, I don't know how this got introduced as none of our llms mask out labels. We can also not deprecate and stop propagating, yes. I'll take care of it when new models are added

So, in that case we remove deprecation and leave the fix for PEFT tuning

So, in that case we remove deprecation and leave the fix for PEFT tuning

exactly! We already get bashed enough for the platora of warning we produce, let's not add one more! 😉

Thanks for addressing the issue in this PR.

So, in that case we remove deprecation and leave the fix for PEFT tuning

What fix are we talking about here?

oh, that's more transformer-side fix. We are talking about attention mask, which gets extra virtual tokens with PEFT so we have to crop them off to match the shape of labels for CELoss

Got it, thanks, I thought I might have to fix something in PEFT.

* fix position ids * fix labels also * fix copies * oops, not that one * dont deprecate

fix position ids

924476d

zucchini-nlp requested a review from ArthurZucker October 3, 2024 10:17

ArthurZucker approved these changes Oct 4, 2024

View reviewed changes

Kamichanw mentioned this pull request Oct 6, 2024

IDEFICS cannot work with past_key_values when I'm using prefix tuning from peft library. #33852

Closed

4 tasks

zucchini-nlp added 3 commits October 7, 2024 09:45

fix labels also

d37ed1c

fix copies

b9ac70d

oops, not that one

2378047

Merge branch 'huggingface:main' into idefics-fix-position-ids

ae67895

zucchini-nlp requested a review from ArthurZucker October 8, 2024 08:11

zucchini-nlp mentioned this pull request Oct 10, 2024

Idefics: enable generation tests #34062

Merged

ArthurZucker approved these changes Oct 10, 2024

View reviewed changes

dont deprecate

e16866f

zucchini-nlp merged commit be9aeba into huggingface:main Oct 11, 2024

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Idefics: fix position ids (huggingface#33907)

3bf06b8

* fix position ids * fix labels also * fix copies * oops, not that one * dont deprecate

Conversation

zucchini-nlp commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 3, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

Kamichanw commented Oct 6, 2024

Uh oh!

zucchini-nlp commented Oct 6, 2024

Uh oh!

zucchini-nlp commented Oct 7, 2024

Uh oh!

BenjaminBossan commented Oct 7, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zucchini-nlp commented Oct 3, 2024 •

edited

Loading