[ProcessingIdefics] Attention mask bug with padding#29449
[ProcessingIdefics] Attention mask bug with padding#29449amyeroberts merged 16 commits intohuggingface:mainfrom
ProcessingIdefics] Attention mask bug with padding#29449Conversation
|
Thanks for the PR! Reviewing asap |
amyeroberts
left a comment
There was a problem hiding this comment.
Hi @bri25yu, thanks for working on this!
Making the padding logic be consistent within the processor is a good idea, but I don't think we should change the default behaviour.
| self, | ||
| prompts: Union[List[TextInput], List[List[TextInput]]], | ||
| padding: Union[bool, str, PaddingStrategy] = False, | ||
| padding: Union[bool, str, PaddingStrategy] = "longest", |
There was a problem hiding this comment.
I don't think we should change the default here for two reasons:
- It doesn't match the default behaviour for most processing classes
- It changes the default behaviour, which can be considered a breaking change
There was a problem hiding this comment.
bri25yu
You may have tagged the wrong person 🙃
It doesn't match the default behaviour for most processing classes
Does the idefics model support non-padded inputs? From my understanding of the original issue, it seems they desire expect some padding even when the argument is not passed.
It changes the default behaviour, which can be considered a breaking change
I'm not sure if this is a bug, but the default behavior appears to have been inaccurate to begin with. Even if the user passes in padding=False, lines 347-354 seems to be forcibly padding the text input to the maximum sequence length anyways.
There was a problem hiding this comment.
You may have tagged the wrong person 🙃
Woops, yes, sorry about that
Does the idefics model support non-padded inputs? From my understanding of the original issue, it seems they desire expect some padding even when the argument is not passed.
No, but no models support non-padded inputs if batch_size > 1 and the input sequences are of different lengths, but all processors and tokenizers do not pad the inputs by default.
I'm not sure if this is a bug, but the default behavior appears to have been inaccurate to begin with. Even if the user passes in padding=False, lines 347-354 seems to be forcibly padding the text input to the maximum sequence length anyways.
In this case, the forcible padding when padding=False should be removed
There was a problem hiding this comment.
In this case, the forcible padding when padding=False should be removed
Yes, this behavior was removed as part of this PR. After this PR, padding=False or not setting padding will not pad the input.
I have modified the PR to default to padding=False, and the unit tests (and one of the integration tests) to explicitly specify padding='longest'. I guess my only concern was that by changing this behavior, a user which was relying on the default behavior would find their code broken overnight.
There was a problem hiding this comment.
I guess my only concern was that by changing this behavior, a user which was relying on the default behavior would find their code broken overnight.
@byi8220 Hmm, yes, this is tricky and that's a good point. OK, in this case, I think your original solution of setting padding='longest' as default is best, ideally with a comment linking to this issue to explain why the default is different and adding a description for the False option in the docstring
…byi8220/transformers into attention-mask-bug-with-padding
ProcessingIdefics] Attention mask bug with padding
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for working on this and adding tests!
What does this PR do?
Fixes #28591
This PR does the following:
IdeficsProcessor.__call__()spaddingparam to'longest'instead ofFalse.IdeficsProcessorTest.test_tokenizer_paddingIdeficsProcessorTest.test_tokenizer_left_paddingwhich is a copy ofIdeficsProcessorTestexcept using default tokenizer (which left pads). This could have been parameterized but it felt fine to just duplicate it.Context
IIUC, it seems like this code is currently always hand-performing an additional padding pass after the tokenizer performs it's padding, according to this code: https://github.com/huggingface/transformers/blob/main/src/transformers/models/idefics/processing_idefics.py#L348-L354
And since the output tensors are stacked together in this code, they all have to be of the same size, i.e. padded or otherwise normalized.
Note: Since this changes the default padding, now anyone who calls this function with an explicit value of
padding=Falsewill likely encounter an error.Testing
Unit tests ran with
pytest ./tests/models/idefics/pass. (129 passed, 115 skipped, 8 warnings in 19.22s)Did not run the slow tests since, running all tests with
RUN_SLOW=yesenabled crashes my workstation.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker
@amyeroberts