docs: clarify tokenizer decoder behavior in v5 (#43066) by kowshik-thatinati · Pull Request #43104 · huggingface/transformers

kowshik-thatinati · 2026-01-05T00:41:40Z

Hey team 👋

Just adding a small clarification regarding the decoder property in AutoTokenizer for Transformers v5.

So basically, in v5, the decoder object might look different compared to v4.
This is because the internal tokenizer backend has been redesigned.
But don’t worry — the output of tokenizer.decode(...) stays the same, so functionality isn’t affected.

What this PR does:

Adds a comment in tokenization_utils_base.py near the decoder property for reference.
Adds a note in the docs (tokenizer_notes.rst) to explain this behavior for users.

Hope this helps avoid confusion for anyone upgrading to v5 🙂

Closes #43066

Updated the example to use a generator for batch processing of prompts.

…indi) Added an example for multilingual text generation using the text-generation pipeline and the accelerate library.

…indi) Added multilingual text-generation example (English, Hindi, Telugu)

Added a section comparing inference speed on CPU vs GPU using Hugging Face pipeline and Accelerator.

Add CPU vs GPU performance comparison example

docs: added multi-input,device-aware text-generation example

Add multilingual text generation example to tutorial(English,Telugu,Hindi)

This example demonstrates batch inference with custom post-processing for text generation using the transformers library.

Add advanced text-generation example with post-processing

Rename advanced_examples to advanced_examples.md

Added a detailed explanation of Named Entity Recognition (NER) and provided a code example using the Hugging Face transformers pipeline.

Enhance NER example in pipeline tutorial

github-actions · 2026-01-05T00:54:11Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43104&sha=831ce6

kowshik-thatinati · 2026-01-05T14:50:33Z

Hi 👋
This PR only adds a clarification in docs and a code comment explaining the tokenizer decoder behavior in v5.
No functional changes involved.

Please let me know if any changes are needed. Thanks!

ArthurZucker

hey! sorry there seems to be a bit too many changes for the fix!

kowshik-thatinati · 2026-01-05T15:56:24Z

Hey, thanks for pointing that out 🙏
You’re right — the fix can be much smaller.

I’ll reduce this PR to only the minimal clarification needed (no extra docs / changes) and update it accordingly.
Thanks for the review!

kowshik-thatinati added 18 commits October 18, 2025 18:56

docs: added multi-input,device-aware text-generation example

db1f3a8

Updated the example to use a generator for batch processing of prompts.

Add multilingual text generation example to tutorial(English,Telugu,H…

7b452db

…indi) Added an example for multilingual text generation using the text-generation pipeline and the accelerate library.

Add multilingual text generation example to tutorial(English,Telugu,H…

c54a87c

…indi) Added multilingual text-generation example (English, Hindi, Telugu)

Add CPU vs GPU performance comparison example

aaa0533

Added a section comparing inference speed on CPU vs GPU using Hugging Face pipeline and Accelerator.

Merge pull request #3 from kowshik-thatinati/kowshik-thatinati-patch-2

575341b

Add CPU vs GPU performance comparison example

Merge branch 'main' into kowshik-thatinati-patch-1

e8fcccb

Merge pull request #1 from kowshik-thatinati/fix/multi-input-textgen

e53a469

docs: added multi-input,device-aware text-generation example

Merge pull request #2 from kowshik-thatinati/kowshik-thatinati-patch-1

b0307c6

Add multilingual text generation example to tutorial(English,Telugu,Hindi)

Add advanced text-generation example with post-processing

c7869ba

This example demonstrates batch inference with custom post-processing for text generation using the transformers library.

Merge pull request #4 from kowshik-thatinati/kowshik-thatinati-patch-3

0b3f183

Add advanced text-generation example with post-processing

Rename advanced_examples to advanced_examples.md

d89c6d6

Merge pull request #5 from kowshik-thatinati/kowshik-thatinati-patch-4

53d49a5

Rename advanced_examples to advanced_examples.md

Enhance NER example in pipeline tutorial

b24fe79

Added a detailed explanation of Named Entity Recognition (NER) and provided a code example using the Hugging Face transformers pipeline.

Merge pull request #6 from kowshik-thatinati/kowshik-thatinati-patch-5

535a657

Enhance NER example in pipeline tutorial

Merge branch 'huggingface:main' into main

4752326

Merge branch 'huggingface:main' into main

1223556

Merge branch 'huggingface:main' into main

c0ec12a

docs: clarify tokenizer decoder behavior in v5 (huggingface#43066)

831ce6f

ArthurZucker reviewed Jan 5, 2026

View reviewed changes

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: clarify tokenizer decoder behavior in v5 (#43066)#43104

docs: clarify tokenizer decoder behavior in v5 (#43066)#43104
kowshik-thatinati wants to merge 18 commits intohuggingface:mainfrom
kowshik-thatinati:fix-decoder-type-v5

kowshik-thatinati commented Jan 5, 2026

Uh oh!

github-actions Bot commented Jan 5, 2026

Uh oh!

kowshik-thatinati commented Jan 5, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

kowshik-thatinati commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kowshik-thatinati commented Jan 5, 2026

Uh oh!

github-actions Bot commented Jan 5, 2026

Uh oh!

kowshik-thatinati commented Jan 5, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

kowshik-thatinati commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants