convert_tokens_to_string does not conform to its signature

## Environment info


- `transformers` version: 4.17.0
- Platform: macOS-11.6.4-x86_64-i386-64bit
- Python version: 3.9.10
- PyTorch version (GPU?): 1.11.0 (False)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: False
- Using distributed or parallel set-up in script?: False

### Who can help
@SaulLu 


## Information

Model I am using (Bert, XLNet ...): `AutoModelForQuestionAnswering`

The problem arises when using:
* [x] the official example scripts: (give details below)
* [x] my own modified scripts: (give details below)

The tasks I am working on is:
* [x] an official GLUE/SQUaD task: Question Answering
* [ ] my own task or dataset: (give details below)

## To reproduce

Steps to reproduce the behavior:

Using the official example script (I will omit it, I will just post the result):

```shell
Question: How many pretrained models are available in 🤗 Transformers?
Answer: ['over', ' 32', ' +']
Question: What does 🤗 Transformers provide?
Answer: ['general', ' -', ' purpose', ' architecture', 's']
Question: 🤗 Transformers provides interoperability between which frameworks?
Answer: ['tensor', 'flow', ' 2', '.', ' 0', ' and', ' p', 'yt', 'or', 'ch']
```

Using the model in our context:
```python
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

text = "Hello my browser is not working, I need help."
questions = [
    "What is the issue?",
    "What is the request?",
]


def extract_answer_idxs(start_logits, end_logits):
    answer_start = torch.argmax(start_logits)
    answer_end = torch.argmax(end_logits) + 1
    return answer_start, answer_end

text = [text] * len(questions)
inputs = tokenizer(questions, text, add_special_tokens=True, return_tensors="pt", max_length=512, truncation=True)
input_ids = inputs["input_ids"].tolist()
outputs = model(**inputs)
idxs = map(
        lambda x, y: extract_answer_idxs(x, y),
        outputs.start_logits,
        outputs.end_logits,
)
answers = list(
    map(
        lambda x, y: tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(x[y[0]:y[1]])),
        input_ids,
        (idx for idx in idxs),
    )
)

print(f"Questions: {questions}")
print(f"Answers: {answers}")
```

Result:
```bash
Questions: ['What is the issue?', 'What is the request?']
Answers: [['my', ' browser', ' is', ' not', ' working'], ['help']]
```

(I also tried in a loop fashion and I get the same identical result.)



## Expected behavior


```bash
Questions: ['What is the issue?', 'What is the request?']
Answers: ['my browser is not working', 'help']
```

As the [docs](https://huggingface.co/docs/transformers/main/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.convert_tokens_to_string) show, I expect a string and not a list of tokens. 
Please notice how whitespaces are somehow introduced in some of the tokens. 
Furthermore, some tokens are split e.g. `['tensor', 'flow', ' 2', '.', ' 0', ' and', ' p', 'yt', 'or', 'ch']`.

I expect `convert_tokens_to_string` to return a `str`, as it was previously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert_tokens_to_string does not conform to its signature #16525

Environment info

Who can help

Information

To reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

convert_tokens_to_string does not conform to its signature #16525

Description

Environment info

Who can help

Information

To reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions