Skip to content

Idefics: fix position ids#33907

Merged
zucchini-nlp merged 6 commits intohuggingface:mainfrom
zucchini-nlp:idefics-fix-position-ids
Oct 11, 2024
Merged

Idefics: fix position ids#33907
zucchini-nlp merged 6 commits intohuggingface:mainfrom
zucchini-nlp:idefics-fix-position-ids

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Oct 3, 2024

What does this PR do?

Fixes #33852 by cropping position ids to the length of inputs and cleans up a little bit the way inputs are prepared for generation

TF equivalence tests are failing for me locally and also failing on main, so those are not related to this PR

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, there might be a test worth adding for this no? (we break that often for peft 😄 )

# create position_ids on the fly for batch generation
position_ids = attention_mask.long().cumsum(-1) - 1
position_ids.masked_fill_(attention_mask == 0, 1)
position_ids = position_ids[:, -seq_length:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attention mask is of shape seq_length_with_past, so that makes sense!

@Kamichanw
Copy link

When this branch can be merged?

@zucchini-nlp
Copy link
Member Author

Will take a look at your comment and merge after seeing what is wrong with labels. TBH. labels should not be affected by position ids, but I'll dig into

@zucchini-nlp
Copy link
Member Author

@ArthurZucker can you take one more look pls? I added fix for labels. Seems one of the very first VLMs introduced it and all other simply copied from that one. I don't see any attn mask applied in LLMs and I don't think we should be doing it under the hood, without the user knowing. So I deprecated that for 2 minor releases

@BenjaminBossan
Copy link
Member

Thanks, there might be a test worth adding for this no? (we break that often for peft 😄 )

Maybe PEFT would be a better place to add this test. If so, LMK and I can work on something (probably based on this code). Since that would only test a single architecture, it wouldn't be super useful IMO but still better than nothing.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just missing info on the shift, we broke this recently (I think we never had this) would be in favor of doing less work for the users, but properly documenting !

Comment on lines +1629 to +1636
# we use the input attention mask to shift the logits and labels, because it is 2D.
# we also crop attn mask in case it is longer, which happens in PrefixTuning with peft
shift_attention_mask = attention_mask[:, -(logits.shape[1] - 1) :].to(logits.device)
logger.warning_once(
"The final logits were masked using attention mask before calculating the loss. "
"This behavior will be removed in v4.48, you should be masking the `labels` with `-100` "
"in data collator before training starts."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the only thing I am curious about, we are introducing something new that we know we are gonna deprecate. Not sure this makes a lot of sense to me! Would either not add this, or just not progagate to other models than idefics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't know how this got introduced as none of our llms mask out labels. We can also not deprecate and stop propagating, yes. I'll take care of it when new models are added

So, in that case we remove deprecation and leave the fix for PEFT tuning

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in that case we remove deprecation and leave the fix for PEFT tuning

exactly! We already get bashed enough for the platora of warning we produce, let's not add one more! 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the issue in this PR.

So, in that case we remove deprecation and leave the fix for PEFT tuning

What fix are we talking about here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's more transformer-side fix. We are talking about attention mask, which gets extra virtual tokens with PEFT so we have to crop them off to match the shape of labels for CELoss

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks, I thought I might have to fix something in PEFT.

@zucchini-nlp zucchini-nlp merged commit be9aeba into huggingface:main Oct 11, 2024
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* fix position ids

* fix labels also

* fix copies

* oops, not that one

* dont deprecate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IDEFICS cannot work with past_key_values when I'm using prefix tuning from peft library.

5 participants