Bart: new cache format#35314
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
cc @BenjaminBossan I am running slow tests from transformers side and the current state of PR should be almost ready for review. So we might need to run PEFT tests now |
Thanks for the ping. I skimmed the PR and if I'm not mistaken, of all the models that were changed, Bart is the only one that is covered in the PEFT test suite. Therefore, running tests with |
|
Cool, the code owners tagged all relevant people. Ready for review! Slow tests for text models that now support cache class are passing on my end |
There was a problem hiding this comment.
In general LGTM. A few minor nits, hence the approval.
I'm assuming slow tests were run for all touched models, and there are no regression with respect to main 🔍
(I've reviewed /generation, /models/bart, /tests/generation, and /tests/models/bart. I'm assuming other models follow the same pattern)
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
|
@zucchini-nlp |
|
@ducviet00 sorry but the PR is blocked by another one.These got stale a bit since we had some higher priority releases recently. I will be back on Bart from next week and get it merged, thanks |
|
Hi @zucchini-nlp |
|
It is currently blocked by another PR (#35786). cc @ArthurZucker can you review it again please? |
|
awesome @zucchini-nlp thank you so much |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM but let's refactor the EncoderDecoderCache to hide the legacy complicated logic!
| position_ids = cache_position.unsqueeze(0) | ||
| position_ids = self.embed_positions(input, past_key_values_length, position_ids=position_ids) | ||
| position_ids = position_ids.to(inputs_embeds.device) |
There was a problem hiding this comment.
unsqueeze can be done in the embed positions, no?
There was a problem hiding this comment.
Oh yeah, in case of Bart we could. I wanted like the module to expect correct 2D position ids, to account for padding. But Bart apparently never used padded positions
| if return_legacy_cache: | ||
| next_cache = past_key_values.to_legacy_cache() | ||
|
|
||
| if not return_dict: |
There was a problem hiding this comment.
can return tuple would be welcome as well but no worries
src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
Outdated
Show resolved
Hide resolved
ArthurZucker
left a comment
There was a problem hiding this comment.
Very nice let's go!
| return_legacy_cache = True | ||
| logger.warning_once( | ||
| "Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.58.0. " | ||
| "You should pass an instance of `EncoderDecoderCache` instead, e.g. " | ||
| "`past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`." | ||
| ) |
There was a problem hiding this comment.
let's put this warning inside the from_legacy_cache directly!
There was a problem hiding this comment.
I don't think we can put a specific version inside cache. Each model is deprecated until different releases, because we update then slowly at lower priority than other tasks
Also, the from_legacy_cache per se isn't deprecated and will be still available. the warning applies only for model's forward pass
|
cc @BenjaminBossan, totally forgot to ping you. Bart is basically same as T5, hope it won't cause red CI on peft 😅 |
What does this PR do?
As per title, enables new cache format in Bart ans several models copied from Bart. Since there are too many models copying attention from Bart, I decided to not touch the audio ones and changed their "Copied from" statements
TODO: