-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Description
Environment info
transformersversion: 4.3.0- Platform: Colab
- Python version: 3.9
- PyTorch version (GPU?): No
- Tensorflow version (GPU?): No
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Information
I am using Deberta Tokenizer. convert_ids_to_tokens() of the tokenizer is not working fine.
The problem arises when using:
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset
To reproduce
Steps to reproduce the behavior:
- Get Debrta Tokenizer
from transformers import DebertaTokenizer
deberta_tokenizer = DebertaTokenizer.from_pretrained('microsoft/deberta-base')- Encode Some Example Using Tokenizer
example = "Hi I am Bhadresh. I found an issue in Deberta Tokenizer"
encoded_example = distilbert_tokenizer.encode(example)- Convert Ids to tokens:
distilbert_tokenizer.convert_ids_to_tokens(encoded_example )
"""
Output: ['[CLS]', '17250', '314', '716', '16581', '324', '3447', '13', '314', '1043', '281', '2071', '287', '1024', '4835', '64', '29130', '7509', '[SEP]']
"""Expected behavior
It should return some tokens like this
['[CLS]', 'hi', 'i', 'am', 'b', '##had', '##resh', '.', 'i', 'found', 'an', 'issue', 'in', 'de', '##bert', '##a', 'token', '##izer', '[SEP]']
Not just convert an integer to string like the current behavior
Tagging SMEs for help:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels