Whisper: incorrect list of non speech tokens

### System Info

- `transformers` version: 4.24.0
- Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.10.1
- PyTorch version (GPU?): 1.12.1+cu102 (True)

### Who can help?

@ArthurZucker 

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

The lists `NON_SPEECH_TOKENS` and `NON_SPEECH_TOKENS_MULTI` contain the tokens 6 and 12 that are not suppressed by default in the [reference implementation](https://github.com/openai/whisper/).

Consider the following example using the reference `whisper` module:

```python
import transformers
from whisper.tokenizer import get_tokenizer

tokenizer = get_tokenizer(multilingual=True, task="transcribe", language="fr")

suppress_tokens = list(
    sorted(
        tokenizer.non_speech_tokens
        + (tokenizer.sot, tokenizer.sot_prev, tokenizer.sot_lm, tokenizer.no_speech)
    )
)

config = transformers.WhisperConfig.from_pretrained("openai/whisper-tiny")
print(suppress_tokens == config.suppress_tokens)  # prints False

config.suppress_tokens.remove(6)
config.suppress_tokens.remove(12)
print(suppress_tokens == config.suppress_tokens)  # prints True
```

### Expected behavior

The list of suppressed tokens should match the reference implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper: incorrect list of non speech tokens #20123

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Whisper: incorrect list of non speech tokens #20123

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions