docs: clarify tokenizer decoder behavior in v5 (#43066)#43104
Open
kowshik-thatinati wants to merge 18 commits intohuggingface:mainfrom
Open
docs: clarify tokenizer decoder behavior in v5 (#43066)#43104kowshik-thatinati wants to merge 18 commits intohuggingface:mainfrom
kowshik-thatinati wants to merge 18 commits intohuggingface:mainfrom
Conversation
Updated the example to use a generator for batch processing of prompts.
…indi) Added an example for multilingual text generation using the text-generation pipeline and the accelerate library.
…indi) Added multilingual text-generation example (English, Hindi, Telugu)
Added a section comparing inference speed on CPU vs GPU using Hugging Face pipeline and Accelerator.
Add CPU vs GPU performance comparison example
docs: added multi-input,device-aware text-generation example
Add multilingual text generation example to tutorial(English,Telugu,Hindi)
This example demonstrates batch inference with custom post-processing for text generation using the transformers library.
Add advanced text-generation example with post-processing
Rename advanced_examples to advanced_examples.md
Added a detailed explanation of Named Entity Recognition (NER) and provided a code example using the Hugging Face transformers pipeline.
Enhance NER example in pipeline tutorial
Contributor
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43104&sha=831ce6 |
Author
|
Hi 👋 Please let me know if any changes are needed. Thanks! |
ArthurZucker
reviewed
Jan 5, 2026
Collaborator
ArthurZucker
left a comment
There was a problem hiding this comment.
hey! sorry there seems to be a bit too many changes for the fix!
Author
|
Hey, thanks for pointing that out 🙏 I’ll reduce this PR to only the minimal clarification needed (no extra docs / changes) and update it accordingly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey team 👋
Just adding a small clarification regarding the
decoderproperty in AutoTokenizer for Transformers v5.So basically, in v5, the decoder object might look different compared to v4.
This is because the internal tokenizer backend has been redesigned.
But don’t worry — the output of
tokenizer.decode(...)stays the same, so functionality isn’t affected.What this PR does:
tokenization_utils_base.pynear thedecoderproperty for reference.tokenizer_notes.rst) to explain this behavior for users.Hope this helps avoid confusion for anyone upgrading to v5 🙂
Closes #43066