Skip to content

DebertaTokenizer always assigns token type ID 0 #15735

@daniel-ziegler

Description

@daniel-ziegler

Environment info

  • transformers version: 4.16.2
  • Platform: Linux-5.15.13-051513-generic-x86_64-with-glibc2.34
  • Python version: 3.9.7
  • PyTorch version (GPU?): 1.9.0+cu111 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: no
  • Using distributed or parallel set-up in script?: no

Who can help

@LysandreJik

Information

Model I am using (Bert, XLNet ...): microsoft/deberta-large

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Run this code:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-large")
print(tokenizer("Hello", "World"))

It outputs:

{'input_ids': [1, 31414, 2, 10988, 2], 'token_type_ids': [0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1]}

Even though I put in two sequences, all token_type_ids are 0.

Expected behavior

The tokens from the second sequence should get type ID 1. token_type_ids should be [0, 0, 0, 1, 1].

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions