Prepend `bos token` to Blip generations by zucchini-nlp · Pull Request #29642 · huggingface/transformers

zucchini-nlp · 2024-03-13T18:22:28Z

What does this PR do?

Fixes a failing test reported in our internal slack. In the prev PR #28994 , we removed bos initialization when embeds are passed as input to generate, and instead initialized with empty placeholder. As a temporary workaround I suggest to artificially prepend "bos" token after generation.

For long term, I suggest to reconsider how we work with multimodal LLMs for generation, I guess we'll have more of these models in the future. I analyzed the models we have in more detail, and what I want to do is use the current Llava-style workflow for generation, but with one change. We will need a method similar to "prepare_inputs_for_generation", which will be responsible to taking inputs in all modalities and concatenating them in correct way. The method will be called in generate, while preparing model inputs. I think calling "concat_modalities" from within generate would make it easier for us:

not to worry about attention and other input lengths mismatch or losing padded attentions, since "concat" will already give us the correct multimodal length. All we do later is simply add one more id for new token
easy to do assisted decoding with these workflow, which I have been trying to make work for a while
we will always have the prompt as part of generated text, which is intuitive for decoder-only models. I guess all VLMs can be thought of as decoder only

I made a board for visual explanation here and a proof-of-concept that it works here.

@gante I want your thoughts on this

HuggingFaceDocBuilderDev · 2024-03-13T18:41:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

Regarding the diff in this PR: It can work as a temporary post-generation fix, if we decrement all set length arguments (max_length, max_new_tokens, min_length, min_new_tokens) by 1 to ensure the final length stays consisted with the user's parameterization

gante · 2024-03-14T15:44:58Z

tests/models/blip_2/test_modeling_blip_2.py

+        # max_length for BLIP includes prompt length, so we need more than 20 to reach EOS here
+        predictions = model.generate(**inputs, max_length=25)


Let's use the max_new_tokens argument, which prevents prompt length-based issues :)

This uncovered a new problem with BLIP 🙃 Apparently the code was not working properly with max_new_tokens.

That was fixed in PR #29585 , so this should be merged only after #29585 is merged and main is rebased.

EDIT: comes out it was a slow test, so the CI is green now

gante · 2024-03-14T15:53:30Z

Regarding the generate changes for VLMs: we actually should go beyond that! The key idea is to rewrite generate in a way that it becomes more composable, with a few blocks expected to be overwritten at a model level (like the function you're proposing). Let's chat about that offline, to figure out the requirements and make a plan, so we can open a GH issue with the roadmap for it :)

gante

LGTM 👍

(ping me again when this PR's dependency gets merged/approved, so we can merge both)

src/transformers/models/blip_2/modeling_blip_2.py

amyeroberts

Thanks for digging into this and fixing!

Only request before merging is:

Run slow model tests
Make sure there's generation tests which cover this behaviour. Just skimming test_modeling_instructblip.py, it doesn't look like GenerationTesterMixin is used

src/transformers/models/instructblip/modeling_instructblip.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

zucchini-nlp · 2024-03-19T09:01:50Z

@amyeroberts

Yes all blip tests are passing now.
Do you mean a test for the BOS token? Maybe we do not really need to check for that, since the PR is actually fixing a failing test. That means there is a test, even though not checking the BOS explicitly.

amyeroberts · 2024-03-19T09:32:56Z

@zucchini-nlp

OK, just to confirm, that's for BLIP 2 and instruct blip? It's surprising if there's no effect on the integration tests for BLIP 2, as this is adding a new behaviour.
No, not necessarily a specific BOS test (although that would be nice). What I meant is to make sure generation behaviour is as expected. For each model, there's specific integration tests which call .generate, but the GenerationTesterMixin isn't used, which I'm concerned means we don't have proper coverage

zucchini-nlp · 2024-03-19T09:41:37Z

@amyeroberts

Yes, only this two models. It is not adding behavior because it was the default behavior for BLIP before a prev PR was merged, which broke it apparently, as we did not run slow tests in VLMs. So this PR fixes it.
Oh, I see now. Agreed, I do not know why it was not added. I will take a look if it works with GenerationTesterMixin, otherwise make custom tests.

zucchini-nlp · 2024-03-19T15:13:57Z

I added GenerationTesterMixin for blip. But turns out none of the VLMs had tests for generation, because of using both text and pixel values in input, while the Mixin is not ready for it. BLIP is running correctly now, fortunately because BLIP can generate from text only, if image is not passed.

@gante , I think we can update generation tests more, to support model kwargs as input. Then check that all generative models have the GenerationTesterMixin. I can do it in a separate PR 😄

gante · 2024-03-20T20:08:36Z

@gante , I think we can update generation tests more, to support model kwargs as input. Then check that all generative models have the GenerationTesterMixin. I can do it in a separate PR 😄

👍 Instead of the GenerationTesterMixin's current _get_input_ids_and_config, we could have an input preparation function like the ModelTesterMixin's _prepare_for_class -- depending on the auto class mapping, prepare the correct inputs for the model, to then feed to generate. Looking forward to that rework @zucchini-nlp !

amyeroberts

Thanks for working on this fix!

Multimodal generation tests sound great ❤️ Looking forward to the future PR!

prepend "bos" to blip generation

88cd8fc

gante reviewed Mar 14, 2024

View reviewed changes

minor changes

0548aac

gante approved these changes Mar 18, 2024

View reviewed changes

src/transformers/models/blip_2/modeling_blip_2.py Outdated Show resolved Hide resolved

gante requested a review from amyeroberts March 18, 2024 15:15

amyeroberts reviewed Mar 18, 2024

View reviewed changes

src/transformers/models/instructblip/modeling_instructblip.py Outdated Show resolved Hide resolved

zucchini-nlp and others added 2 commits March 19, 2024 13:26

Update src/transformers/models/blip_2/modeling_blip_2.py

fcddef0

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

Update src/transformers/models/instructblip/modeling_instructblip.py

ea9491a

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

add generation tester mixin

f769af8

amyeroberts approved these changes Mar 21, 2024

View reviewed changes

gante merged commit b469ebc into huggingface:main Mar 21, 2024

		# max_length for BLIP includes prompt length, so we need more than 20 to reach EOS here
		predictions = model.generate(**inputs, max_length=25)

Conversation

zucchini-nlp commented Mar 13, 2024

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2024

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

gante Mar 14, 2024

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante commented Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Mar 19, 2024

Uh oh!

amyeroberts commented Mar 19, 2024

Uh oh!

zucchini-nlp commented Mar 19, 2024

Uh oh!

zucchini-nlp commented Mar 19, 2024

Uh oh!

gante commented Mar 20, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp Mar 15, 2024 •

edited

Loading

gante commented Mar 14, 2024 •

edited

Loading