-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
Environment info
transformersversion: 4.17.0.dev0- Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.34
- Python version: 3.9.7
- PyTorch version (GPU?): 1.10.1+cu102 (True)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.3.6 (cpu)
- Jax version: 0.2.26
- JaxLib version: 0.1.75
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help
Information
In EncoderDecoder models one can pass encoder_outputs as a tuple of Tensors . However, if you do that this line will fail with
AttributeError: 'tuple' object has no attribute 'last_hidden_state'since the tuple isn't modified in the forward method.
So if it is a tuple, encoder_outputs could maybe wrapped in a ModelOutput class or something similar. Or handle the tuple somehow explicitly.
On a slight tangent
I made a SpeechEncoderDecoderModel for the robust speech challenge: https://huggingface.co/jsnfly/wav2vec2-large-xlsr-53-german-gpt2. I found that adding the position embeddings of the decoder model to the outputs of the encoder model improved performance significantly (basically didn't work without it).
This needs small modifications to the __init__ and forward methods of the SpeechEncoderDecoderModel.
At the moment this seems to me too much of a "hack" to add it to the SpeechEncoderDecoderModel class generally (for example via a flag), because it may differ for different decoder models and probably also needs more verification. @patrickvonplaten showed some interest that this could be included in Transformers nonetheless. What do you think?