System Info
Wrong decoder type with 5.0.0rc1.
Information
Tasks
Reproduction
Run this:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B")
print(tokenizer.decoder)
You get:
Sequence(decoders=[Replace(pattern=String("▁"), content=" "), ByteFallback(), Fuse(), Strip(content=" ", start=1, stop=0)])
with Transformers v5.
And with Transformers 4.57.3 and earlier you get:
ByteLevel(add_prefix_space=True, trim_offsets=True, use_regex=True)
Is it expected that this changed?
Expected behavior
The same decoder type as transformers 4.57.3
System Info
Wrong decoder type with
5.0.0rc1.Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
Run this:
You get:
with Transformers v5.
And with Transformers 4.57.3 and earlier you get:
Is it expected that this changed?
Expected behavior
The same decoder type as transformers 4.57.3