[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding#27195
[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding#27195patrickvonplaten merged 10 commits intomainfrom
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
The failing tests seem to be unrelated: |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for adding this!
Just some nits on formatting, docstrings and tests. Would like to have 👍 from @gante before merging to check the changes to generation logic.
| # PT-only test: TF doesn't support assisted decoding yet. | ||
| # Bart subclass with a kwarg that distorts the output | ||
| class FakeBart(BartForConditionalGeneration): | ||
| def forward(self, input_ids, foo=False, **kwargs): |
There was a problem hiding this comment.
Can we use a more descriptive name than foo here? Or maybe just clarify in the comment that foo is said kwarg e.g. # Bart subclass with kwarg 'foo' that distorts the output
LysandreJik
left a comment
There was a problem hiding this comment.
Impressive PR, it's very well contained!
The skeleton and API look good to me, I'm out of my depth for the generation logic.
| config: WhisperConfig | ||
| """ | ||
|
|
||
| main_input_name = "input_ids" |
There was a problem hiding this comment.
Was this an oversight in the implementation?
There was a problem hiding this comment.
WhisperDecoder was never tested as stand-alone before. It's also doesn't really make sense to use it alone because on always needs the encoded audio features
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…/transformers into whisper_decoder_only
|
Merging as I think Joao is off today and changes in assisted generation are quite minimal IMO. @gante would be great if you could nevertheless take a look once back :-) |
There was a problem hiding this comment.
@patrickvonplaten All good on the generation front 👍
Cool strategy for handling the case of a shared encoder, I hope people realize that encoder-decoder LLMs may be viable (and faster) for input-grounded tasks
…uggingface#27195) * finish * add tests * fix all tests * [Assistant Decoding] Add test * fix more * better * finish * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

What does this PR do?
This PR enables speculative decoding for all cases where the assistant model is stripped of its encoder weights as they are shared with the teacher model. For now, Distil-Whisper is the main use case here.
In addition a
WhisperForCausalLMis loaded as it didn't exist yet for Distil-Whisper.The following code should therefore be enabled: