if output is tuple like facebook/hf-seamless-m4t-medium, waveform is …#29722
if output is tuple like facebook/hf-seamless-m4t-medium, waveform is …#29722ArthurZucker merged 3 commits intohuggingface:mainfrom
Conversation
|
test code |
|
|
@amyeroberts please have a review. |
|
Hmmm, this is funny, as it indicates either unexpected outputs from the model, it shouldn't be used with this pipeline or that it wasn't properly tested for the pipeline @ylacombe could you please take a look into this? |
…the first element Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
|
could anyone help review the pr? |
ArthurZucker
left a comment
There was a problem hiding this comment.
THanks for the fix. Could you add a test?
What is weird is that this model has a vocoder. (hifi_gan): SeamlessM4THifiGan but the pipeline does not. So it's not properly integrated (or incompatible with SpeechT5Vocoder?You are right, the output ofSeamlessM4TForTextToSpeech` is a wave and length. The fix should be enough
We need a test to make sure the waveform generated is correct.
ylacombe
left a comment
There was a problem hiding this comment.
Hey @sywangyi, thanks for spotting this! Could you also add a test to test_pipeline_text_to_audio.py so that we make sure it works ?
What is weird is that this model has a vocoder. (hifi_gan): SeamlessM4THifiGan but the pipeline does not. So it's not properly integrated (or incompatible with SpeechT5Vocoder?
This would be the case if it was in MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES here!
Since the vocoder here is specific to M4T, I made the choice to add the full model (+ vocoder) to MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES
|
test added and also found some issue when test batch_size=2 @ylacombe |
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
| def postprocess(self, waveform): | ||
| output_dict = {} | ||
|
|
||
| if isinstance(waveform, tuple): |
There was a problem hiding this comment.
It would probably make more sense to use return_dict=True to make sure we rely on naming rather than indexing?
This supposed the first is always a waveform
There was a problem hiding this comment.
not return_dict for semaless_m4v case see https://github.com/huggingface/transformers/blob/main/src/transformers/models/seamless_m4t/modeling_seamless_m4t.py#L3520. also if the class is returned. postprocess in text_to_audio.py is lack of disposal of class input.
There was a problem hiding this comment.
This is a shortcoming of the modeling code of M4T (and M4Tv2) that doesn't use return_dict=True:
transformers/src/transformers/models/seamless_m4t/modeling_seamless_m4t.py
Lines 3520 to 3528 in 416711c
Would you add to add return_dict to M4T modeling code @sywangyi ? This would look like something like this where relevant:
transformers/src/transformers/models/roc_bert/modeling_roc_bert.py
Lines 1053 to 1063 in 33288ff
If not, I can take care of this, but the current PR relies on this fix to have batch_size > 2 working
There was a problem hiding this comment.
Hi, @ylacombe actually return_dict_in_generate should be used. because this is used in generate() func. speech t5 also has such issue. see the return of speech t5. tuple or tensor will be returned. https://github.com/huggingface/transformers/blob/main/src/transformers/models/speecht5/modeling_speecht5.py#L2561-L2593
There was a problem hiding this comment.
so I think pipeline should support all format output like dict, tuple, tensor, WDYT?
| def postprocess(self, waveform): | ||
| output_dict = {} | ||
|
|
||
| if isinstance(waveform, tuple): |
There was a problem hiding this comment.
This is a shortcoming of the modeling code of M4T (and M4Tv2) that doesn't use return_dict=True:
transformers/src/transformers/models/seamless_m4t/modeling_seamless_m4t.py
Lines 3520 to 3528 in 416711c
Would you add to add return_dict to M4T modeling code @sywangyi ? This would look like something like this where relevant:
transformers/src/transformers/models/roc_bert/modeling_roc_bert.py
Lines 1053 to 1063 in 33288ff
If not, I can take care of this, but the current PR relies on this fix to have batch_size > 2 working
|
I also add "return_intermediate_token_ids" in the testcase and fix the issue in postprocess. |
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
A good catch, thankful that you added a small test! 🤗
| if isinstance(waveform, dict): | ||
| waveform = waveform["waveform"] | ||
| elif isinstance(waveform, tuple): | ||
| waveform = waveform[0] |
…the first element