IDEFICS: support inputs embeds#34043
Conversation
| # generating the first new token or not, and we only want to use the embeddings for the first new token) | ||
| if not self.config.is_encoder_decoder and model_input_name == "inputs_embeds": | ||
| model_kwargs["use_cache"] = True | ||
| generation_config.use_cache = True |
There was a problem hiding this comment.
the subsequent step of preparing cache checks generation config, and we miss the step of adding a Cache class. Fails for IDEFICS only because it doesn't prepare cache in "forward" if "use_cache", but rather assume that there's already a correct cache provided
There was a problem hiding this comment.
This change updates generation_config, which means we would have two variables containing the source of truth for this flag (generation_config.use_cache and model_kwargs["use_cache"]). More than one source of truth usually leads to bugs 🐛 We also know that we need model_kwargs["use_cache"] for the forward pass.
To avoid multiple sources of truth, I'd suggest one of the following:
- let's use
model_kwargs["use_cache"]in all places after this if/else, instead of also usinggeneration_config.use_cache. - (I have a preference for this one) Let's use
generation_config.use_cacheeverywhere, removingmodel_kwargs["use_cache"]from most places in thegeneratefunction. Before calling the decoding methods, let's addmodel_kwargs["use_cache"] = generation_config.use_cache, since we need this for the forward pass andgeneration_configbarely gets used in the decoding methods
(I know this issue predates your change 🤗 But since we're touching it, let's do the most sustainable change)
ArthurZucker
left a comment
There was a problem hiding this comment.
It's always the same models causing problems 👁️ @leot13 👁️
|
I am trying to make sure that IDEFICS models start using library standards, and hopefully after we enable generation tests it will be easier to catch those bugs when adding the model 🤗 |
gante
left a comment
There was a problem hiding this comment.
LGTM, except for that nit in the generate body :)
Thank you for fixing 💪
| # generating the first new token or not, and we only want to use the embeddings for the first new token) | ||
| if not self.config.is_encoder_decoder and model_input_name == "inputs_embeds": | ||
| model_kwargs["use_cache"] = True | ||
| generation_config.use_cache = True |
There was a problem hiding this comment.
This change updates generation_config, which means we would have two variables containing the source of truth for this flag (generation_config.use_cache and model_kwargs["use_cache"]). More than one source of truth usually leads to bugs 🐛 We also know that we need model_kwargs["use_cache"] for the forward pass.
To avoid multiple sources of truth, I'd suggest one of the following:
- let's use
model_kwargs["use_cache"]in all places after this if/else, instead of also usinggeneration_config.use_cache. - (I have a preference for this one) Let's use
generation_config.use_cacheeverywhere, removingmodel_kwargs["use_cache"]from most places in thegeneratefunction. Before calling the decoding methods, let's addmodel_kwargs["use_cache"] = generation_config.use_cache, since we need this for the forward pass andgeneration_configbarely gets used in the decoding methods
(I know this issue predates your change 🤗 But since we're touching it, let's do the most sustainable change)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* support embeds * use cache from config * style... * fix tests after rebase
What does this PR do?
Fixes #34033 and enables tests for VLMs. Prev tests were all skipped because we had a hard check for CausalLM Mapping