Skip to content

Conversation

@Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Oct 15, 2025

Tokenizers return a BatchEncoding dict by default, but apply_chat_template doesn't. This is just an accident of how I wrote it originally, which we were stuck with for backward compatibility reasons. Ideally, I think apply_chat_template should return exactly the same format as tokenizers, since it also performs tokenization most of the time. It's now v5 time, so we can start making that happen 😅

This PR also updates tests, and removes very old test_tokenization_for_chat tests. These model-specific tests don't do anything useful anymore, since the apply_chat_template functionality is unified across tokenizers; they're mostly a legacy leftover from when model classes used to need custom chat tokenization functions.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1 Rocketknight1 marked this pull request as ready for review October 15, 2025 17:06
@Rocketknight1
Copy link
Member Author

It's a v5 breaking change so cc @LysandreJik @Cyrilvallez @ArthurZucker for review to make sure you're okay with it

@Rocketknight1 Rocketknight1 force-pushed the v5_chat_template_return_type branch from 0f4f200 to 9330959 Compare October 21, 2025 15:48
assert self.rust_tokenizer_3b([" Sam", "Sam"]).input_ids == [[5502, 2], [5502, 2]]

@require_jinja
def test_tokenization_for_chat(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do you remove this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's extremely old - it's not related to this PR really, but these tests come from before chat templates, and we just patched them to support chat templates after chat templates were added. They only exist for a few models, and I think we don't want to keep them, because it's not clear what they test that the main chat template tests don't.

]

output = self.tokenizer.apply_chat_template(conversation, tokenize=True)
output = self.tokenizer.apply_chat_template(conversation, tokenize=True).input_ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for consistency, since tokenize=True by default, I guess you can remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

]
with self.assertRaises(ValueError):
tokenizer.encode_message_with_chat_template(conversation[0], add_generation_prompt=True)
tokenizer.encode_message_with_chat_template(conversation[0], add_generation_prompt=True, return_dict=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit again: Not sure if this change is needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@Rocketknight1 Rocketknight1 force-pushed the v5_chat_template_return_type branch from b729fb7 to c452142 Compare October 30, 2025 16:59
@Rocketknight1 Rocketknight1 mentioned this pull request Oct 30, 2025
@Rocketknight1 Rocketknight1 force-pushed the v5_chat_template_return_type branch from 2acbd45 to abf7158 Compare October 31, 2025 12:02
@Rocketknight1 Rocketknight1 force-pushed the v5_chat_template_return_type branch from abf7158 to 137015e Compare October 31, 2025 13:30
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: blenderbot, bloom, cohere, gemma, gpt2, gpt_sw3, llama, voxtral

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@awni
Copy link

awni commented Dec 16, 2025

This change is a bit unfortunate. Seems like an unnecessary API break which is going to cause a headache for a lot of people.

You can see all the places we use this in mlx-lm and also all the models uploaded to Hugging Face Hub have code snippets which include this (see e.g https://huggingface.co/mlx-community/mistralai_Devstral-Small-2-24B-Instruct-2512-MLX-6Bit).

mlx_lm/tuner/datasets.py:        tokens = self.tokenizer.apply_chat_template(messages, tools=tools)
mlx_lm/tuner/datasets.py:                self.tokenizer.apply_chat_template(
mlx_lm/tuner/datasets.py:        tokens = self.tokenizer.apply_chat_template(messages, tools=tools)
mlx_lm/tuner/datasets.py:                self.tokenizer.apply_chat_template(
mlx_lm/server.py:    Convert message content to a format suitable for `apply_chat_template`.
mlx_lm/server.py:                return tokenizer.apply_chat_template(
mlx_lm/server.py:        help="""A JSON formatted string of arguments for the tokenizer's apply_chat_template, e.g. '{"enable_thinking":false}'""",
mlx_lm/generate.py:        help="Additional config for `apply_chat_template`. Should be a dictionary of"
mlx_lm/generate.py:        prompt = tokenizer.apply_chat_template(
mlx_lm/generate.py:            test_prompt = tokenizer.apply_chat_template(
mlx_lm/cache_prompt.py:        prompt = tokenizer.apply_chat_template(
mlx_lm/chat.py:        prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/utils.py:            prompt = tokenizer.apply_chat_template(
mlx_lm/examples/pipeline_generate.py:    prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/examples/generate_response.py:prompt = tokenizer.apply_chat_template(
mlx_lm/examples/batch_generate_response.py:    tokenizer.apply_chat_template(
mlx_lm/examples/batch_generate_response.py:    tokenizer.apply_chat_template(
mlx_lm/examples/tool_use.py:prompt = tokenizer.apply_chat_template(
mlx_lm/examples/tool_use.py:prompt = tokenizer.apply_chat_template(
mlx_lm/examples/chat.py:prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/examples/chat.py:prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/evaluate.py:    def apply_chat_template(self, chat_history, add_generation_prompt=True) -> str:
mlx_lm/evaluate.py:        return self.tokenizer.apply_chat_template(
mlx_lm/evaluate.py:    return apply_chat_template
mlx_lm/evaluate.py:    apply_chat_template = chat_template_fn()
mlx_lm/evaluate.py:        apply_chat_template, e.g. '{"enable_thinking":false}'""",
mlx_lm/evaluate.py:        use_chat_template=args.apply_chat_template,
mlx_lm/evaluate.py:    MLXLM.apply_chat_template = chat_template_fn(**args.chat_template_args)
mlx_lm/evaluate.py:        apply_chat_template=lm.use_chat_template,
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:                self.tokenizer.apply_chat_template(
tests/test_generate.py:                self.tokenizer.apply_chat_template(
tests/test_chat.py:        mock_tokenizer.apply_chat_template.return_value = "processed_prompt"
tests/test_chat.py:        # Verify that apply_chat_template was called with system prompt
tests/test_chat.py:        mock_tokenizer.apply_chat_template.assert_called()
tests/test_chat.py:        call_args = mock_tokenizer.apply_chat_template.call_args[0][
tests/test_chat.py:        mock_tokenizer.apply_chat_template.return_value = "processed_prompt"
tests/test_chat.py:        # Verify that apply_chat_template was called without system prompt
tests/test_chat.py:        mock_tokenizer.apply_chat_template.assert_called()
tests/test_chat.py:        call_args = mock_tokenizer.apply_chat_template.call_args[0][

@Rocketknight1
Copy link
Member Author

It is, and I'm sorry - but the v5 update is a chance to standardize a lot of things in our API, which will hopefully make usability better in the long term. For now, you can explicitly set return_dict=False to get the old behaviour and return identical results from both v4 and v5.

@Cyrilvallez
Copy link
Member

Yes, as @Rocketknight1 said, we believe it makes much more sense for most users than it breaks current code... We unfortunately cannot go forward without breaking a few eggs... But we always provide a solution to keep the exact same behavior as before!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants