Generate: visit non-llm prepare_inputs_for_generation#34199
Generate: visit non-llm prepare_inputs_for_generation#34199gante merged 5 commits intohuggingface:mainfrom
prepare_inputs_for_generation#34199Conversation
|
@zucchini-nlp / @ylacombe : i've tagged you both so that yoach can double-check audio models, and raushan the others 🤗 |
| device: torch.device, | ||
| cache_position: torch.Tensor, | ||
| batch_size: int, | ||
| **kwargs, |
There was a problem hiding this comment.
Some models have extra kwargs. With **kwargs we can make the generalization in GenerationMixin.prepare_inputs_for_generate :)
| dtype=self.get_output_embeddings().weight.dtype, | ||
| dtype=self.dtype, |
There was a problem hiding this comment.
for my own understanding, is this needed for multi-gpu setting?
There was a problem hiding this comment.
This change is needed because moshi doesn't have get_output_embeddings(), and creating get_output_embeddings() there would be a bit ambiguous.
self.dtype works just as fine and is more versatile :)
| # Overwritten -- there are mutually exclusive inputs (if the logic to make `image_hidden_states` take | ||
| # precedence is moved to the model, we can remove this fn) | ||
|
|
There was a problem hiding this comment.
I think most VLMs do smth similar when we pass pixel values only in pre-fill stage. I am thinking that for Idefics it can also be a check on if cache_position[0] == 0 as we don't support multi-turn dialogues. So I am think we can find a way to generalize for VLMs in a subsequent PR :)
About moving the logic to modeling, I think we want to discourage anyone to pass both
There was a problem hiding this comment.
Next I will be working on separating prefil from non-prefil.
Perhaps if I add a flag prefil: bool, we can sort most VLMs!
ylacombe
left a comment
There was a problem hiding this comment.
LGTM, thanks for this @gante!
Most of the audio models' prepare_inputs_for_generation were untouched, but for the ones that were, it looks like it'll work.
Special thanks for Moshi, the new prepare_inputs_for_generation covers every small edge case (no base model, no get_output_embeddings, doing post-processing). It's much cleaner to read now.
Are you planning to run slow tests for every models btw ?
Yes :) Now that's approved, I will be running slow tests before merging @ylacombe |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
Ran slow tests on all models with significant changes, no regressions -- merging :) |
…34199) * tmp * all visited * test all * Update src/transformers/models/moshi/modeling_moshi.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * delete another one :D --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
Closes #32685 🙌
This PR does a final pass over the remaining
prepare_inputs_for_generation:👉 After this PR, let's aim at overwriting
prepare_inputs_for_generateas few times as possible, so we can quickly roll out model-agnostic upgrades 🏎️ and minimize bugs 🐛