Skip to content

docs: clarify tokenizer decoder behavior in v5 (#43066)#43104

Open
kowshik-thatinati wants to merge 18 commits intohuggingface:mainfrom
kowshik-thatinati:fix-decoder-type-v5
Open

docs: clarify tokenizer decoder behavior in v5 (#43066)#43104
kowshik-thatinati wants to merge 18 commits intohuggingface:mainfrom
kowshik-thatinati:fix-decoder-type-v5

Conversation

@kowshik-thatinati
Copy link
Copy Markdown

Hey team 👋

Just adding a small clarification regarding the decoder property in AutoTokenizer for Transformers v5.

So basically, in v5, the decoder object might look different compared to v4.
This is because the internal tokenizer backend has been redesigned.
But don’t worry — the output of tokenizer.decode(...) stays the same, so functionality isn’t affected.

What this PR does:

  1. Adds a comment in tokenization_utils_base.py near the decoder property for reference.
  2. Adds a note in the docs (tokenizer_notes.rst) to explain this behavior for users.

Hope this helps avoid confusion for anyone upgrading to v5 🙂

Closes #43066

Updated the example to use a generator for batch processing of prompts.
…indi)

Added an example for multilingual text generation using the text-generation pipeline and the accelerate library.
…indi)

Added multilingual text-generation example (English, Hindi, Telugu)
Added a section comparing inference speed on CPU vs GPU using Hugging Face pipeline and Accelerator.
Add CPU vs GPU performance comparison example
docs: added multi-input,device-aware text-generation example
Add multilingual text generation example to tutorial(English,Telugu,Hindi)
This example demonstrates batch inference with custom post-processing for text generation using the transformers library.
Add advanced text-generation example with post-processing
Rename advanced_examples to advanced_examples.md
Added a detailed explanation of Named Entity Recognition (NER) and provided a code example using the Hugging Face transformers pipeline.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 5, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43104&sha=831ce6

@kowshik-thatinati
Copy link
Copy Markdown
Author

Hi 👋
This PR only adds a clarification in docs and a code comment explaining the tokenizer decoder behavior in v5.
No functional changes involved.

Please let me know if any changes are needed. Thanks!

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey! sorry there seems to be a bit too many changes for the fix!

@kowshik-thatinati
Copy link
Copy Markdown
Author

Hey, thanks for pointing that out 🙏
You’re right — the fix can be much smaller.

I’ll reduce this PR to only the minimal clarification needed (no extra docs / changes) and update it accordingly.
Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong tokenizer decoder type in Transformers v5

2 participants