Fix tokenizer auto_map being ignored for custom models by Anri-Lombard · Pull Request #43219 · huggingface/transformers

Anri-Lombard · 2026-01-11T04:07:37Z

PR #42894 introduced an early-exit to TokenizersBackend when tokenizer_class doesn't match the registered tokenizer for a model_type. However, this check was placed before the auto_map extraction, causing custom tokenizers (with trust_remote_code=True) to be ignored.

Reproduction:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct", trust_remote_code=True)
print(tokenizer.decode(tokenizer.encode("This is a test")))
# Expected: "This is a test"
# Actual (bug): "Th is <unk> is <unk> a <unk> te st"

For models with unregistered model_type (like iquestloopcoder), the condition TOKENIZER_MAPPING_NAMES.get(model_type, "") != tokenizer_config_class is always True, causing early-exit to TokenizersBackend without checking if auto_map exists.

This fix moves the auto_map extraction before the early-exit check and adds tokenizer_auto_map is None to the condition.

Added regression test test_custom_tokenizer_with_mismatched_tokenizer_class.

…3202) PR huggingface#42894 added an early-exit to TokenizersBackend when tokenizer_class doesn't match the registered tokenizer for a model_type. However, this early-exit was placed before the auto_map check, causing custom tokenizers with trust_remote_code to be ignored. This fix moves the auto_map extraction before the early-exit check and adds tokenizer_auto_map is None to the condition, so models with custom tokenizers properly use the dynamic module loading path.

vasqu · 2026-01-12T13:57:13Z

cc @itazap @ArthurZucker

ArthurZucker

This looks good to me!
Thankd for also adding a test!

awni · 2026-01-21T14:13:52Z

@Anri-Lombard @ArthurZucker thanks for the fix here. We are depending on it in mlx-lm and looking forward to the next RC.

ArthurZucker · 2026-01-22T09:06:55Z

Can you fix the quality checks please?! 🤗

vasqu · 2026-01-22T17:10:39Z

@bot /style

github-actions · 2026-01-22T17:11:28Z

Style fix is beginning .... View the workflow run here.

github-actions · 2026-01-22T17:16:35Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

HuggingFaceDocBuilderDev · 2026-01-22T17:25:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2026-01-22T17:32:30Z

I was so free to make the style check myself to merge, thanks for contributing ❤️

…3219) * Fix tokenizer auto_map being ignored for custom models (huggingface#43202) PR huggingface#42894 added an early-exit to TokenizersBackend when tokenizer_class doesn't match the registered tokenizer for a model_type. However, this early-exit was placed before the auto_map check, causing custom tokenizers with trust_remote_code to be ignored. This fix moves the auto_map extraction before the early-exit check and adds tokenizer_auto_map is None to the condition, so models with custom tokenizers properly use the dynamic module loading path. * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: vasqu <antonprogamer@gmail.com>

Anri-Lombard mentioned this pull request Jan 11, 2026

tokenizer.decode producing bad results in some cases from 5.0.0rc1 to 5.0.0rc2 #43202

Closed

awni mentioned this pull request Jan 14, 2026

bump transformers ml-explore/mlx-lm#746

Merged

ArthurZucker approved these changes Jan 20, 2026

View reviewed changes

Merge branch 'main' into fix-tokenizer-auto-map-regression

a8d05c9

style

1fcd265

vasqu enabled auto-merge (squash) January 22, 2026 17:16

vasqu merged commit 5410465 into huggingface:main Jan 22, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tokenizer auto_map being ignored for custom models#43219

Fix tokenizer auto_map being ignored for custom models#43219
vasqu merged 3 commits intohuggingface:mainfrom
Anri-Lombard:fix-tokenizer-auto-map-regression

Anri-Lombard commented Jan 11, 2026

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

awni commented Jan 21, 2026

Uh oh!

ArthurZucker commented Jan 22, 2026

Uh oh!

vasqu commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 22, 2026

Uh oh!

Uh oh!

vasqu commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Anri-Lombard commented Jan 11, 2026

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

awni commented Jan 21, 2026

Uh oh!

ArthurZucker commented Jan 22, 2026

Uh oh!

vasqu commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 22, 2026

Uh oh!

Uh oh!

vasqu commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants