Prepend bos token to Blip generations#29642
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
gante
left a comment
There was a problem hiding this comment.
Regarding the diff in this PR: It can work as a temporary post-generation fix, if we decrement all set length arguments (max_length, max_new_tokens, min_length, min_new_tokens) by 1 to ensure the final length stays consisted with the user's parameterization
| # max_length for BLIP includes prompt length, so we need more than 20 to reach EOS here | ||
| predictions = model.generate(**inputs, max_length=25) |
There was a problem hiding this comment.
Let's use the max_new_tokens argument, which prevents prompt length-based issues :)
There was a problem hiding this comment.
|
Regarding the generate changes for VLMs: we actually should go beyond that! The key idea is to rewrite generate in a way that it becomes more composable, with a few blocks expected to be overwritten at a model level (like the function you're proposing). Let's chat about that offline, to figure out the requirements and make a plan, so we can open a GH issue with the roadmap for it :) |
gante
left a comment
There was a problem hiding this comment.
LGTM 👍
(ping me again when this PR's dependency gets merged/approved, so we can merge both)
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for digging into this and fixing!
Only request before merging is:
- Run slow model tests
- Make sure there's generation tests which cover this behaviour. Just skimming test_modeling_instructblip.py, it doesn't look like GenerationTesterMixin is used
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
|
|
|
|
I added @gante , I think we can update generation tests more, to support model kwargs as input. Then check that all generative models have the GenerationTesterMixin. I can do it in a separate PR 😄 |
👍 Instead of the |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for working on this fix!
Multimodal generation tests sound great ❤️ Looking forward to the future PR!
What does this PR do?
Fixes a failing test reported in our internal slack. In the prev PR #28994 , we removed
bosinitialization when embeds are passed as input togenerate, and instead initialized with empty placeholder. As a temporary workaround I suggest to artificially prepend "bos" token after generation.For long term, I suggest to reconsider how we work with multimodal LLMs for generation, I guess we'll have more of these models in the future. I analyzed the models we have in more detail, and what I want to do is use the current Llava-style workflow for generation, but with one change. We will need a method similar to "prepare_inputs_for_generation", which will be responsible to taking inputs in all modalities and concatenating them in correct way. The method will be called in generate, while preparing model inputs. I think calling "concat_modalities" from within generate would make it easier for us:
I made a board for visual explanation here and a proof-of-concept that it works here.
@gante I want your thoughts on this