Error when passing encoder_outputs as tuple to EncoderDecoder models

## Environment info


- `transformers` version: 4.17.0.dev0
- Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.34
- Python version: 3.9.7
- PyTorch version (GPU?): 1.10.1+cu102 (True)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.3.6 (cpu)
- Jax version: 0.2.26
- JaxLib version: 0.1.75
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no


### Who can help
@patrickvonplaten

## Information

In EncoderDecoder models one can pass `encoder_outputs` [as a tuple of Tensors ](https://github.com/jsnfly/transformers/blob/8ce133063120683018b214fe10d1449e4c2401da/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py#L106). However, if you do that [this line](https://github.com/jsnfly/transformers/blob/8ce133063120683018b214fe10d1449e4c2401da/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py#L549) will fail with
```python
AttributeError: 'tuple' object has no attribute 'last_hidden_state'
``` 
since the tuple isn't modified in the `forward` method.
So if it is a tuple, `encoder_outputs` could maybe wrapped in a `ModelOutput` class or something similar. Or handle the tuple somehow explicitly.

## On a slight tangent

I made a `SpeechEncoderDecoderModel` for the robust speech challenge: https://huggingface.co/jsnfly/wav2vec2-large-xlsr-53-german-gpt2. I found that adding the position embeddings of the decoder model to the outputs of the encoder model improved performance significantly (basically didn't work without it). 
This needs [small modifications](https://huggingface.co/jsnfly/wav2vec2-large-xlsr-53-german-gpt2/blob/main/training/model.py#L8) to the `__init__` and `forward` methods of the `SpeechEncoderDecoderModel`.

At the moment this seems to me too much of a "hack" to add it to the `SpeechEncoderDecoderModel` class generally (for example via a flag), because it may differ for different `decoder` models and probably also needs more verification. @patrickvonplaten showed some interest that this could be included in Transformers nonetheless. What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when passing encoder_outputs as tuple to EncoderDecoder models #15536

Environment info

Who can help

Information

On a slight tangent

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when passing encoder_outputs as tuple to EncoderDecoder models #15536

Description

Environment info

Who can help

Information

On a slight tangent

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions