System Info
from transformers import AutoTokenizer, AutoModelForImageTextToText, AutoProcessor
import torch
base_model = "mistralai/Ministral-3-8B-Instruct-2512-BF16"
model = AutoModelForImageTextToText.from_pretrained(base_model, dtype=torch.bfloat16)
model = model.to("cuda:1")
tokenizer = AutoProcessor.from_pretrained(base_model)
user_prompt = "hello how are you?"
messages = [
{"role": "user", "content": user_prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=50, do_sample=False)
decoded_output = tokenizer.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]
print(decoded_output)
Output:
Hello!ĠðŁĺĬĠI'mĠjustĠaĠvirtualĠassistant,ĠsoĠIĠdon'tĠhaveĠfeelings,ĠbutĠI'mĠhereĠandĠreadyĠtoĠhelpĠyouĠwithĠanythingĠyouĠneed!ĠHowĠaboutĠyouâĢĶhowĠareĠ*you*ĠdoingĠtoday?ĠAnythingĠfunĠorĠinterestingĠon
Environments:
Python 3.12.7
transformers 5.0.0.dev0 (installed from main branch)
torch: 2.9.0
mistral_common: 1.8.6
The same code with MinistralCommonBackend loaded tokenizer works:
Code:
import torch
from transformers import AutoModelForImageTextToText, MistralCommonBackend
tokenizer = MistralCommonBackend.from_pretrained(base_model)
model = AutoModelForImageTextToText.from_pretrained(
base_model, torch_dtype=torch.bfloat16
)
model = model.to("cuda:2")
user_prompt = "hello how are you?"
messages = [
{"role": "user", "content": user_prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to(model.device)
generate_ids = model.generate(**inputs, max_new_tokens=50, do_sample=False)
decoded_output = tokenizer.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]
print(decoded_output)
Output:
Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you with anything you need. How about you? How are you doing today?[😊]
Who can help?
No response
Information
Tasks
Reproduction
from transformers import AutoTokenizer, AutoModelForImageTextToText, AutoProcessor
import torch
base_model = "mistralai/Ministral-3-8B-Instruct-2512-BF16"
model = AutoModelForImageTextToText.from_pretrained(base_model, dtype=torch.bfloat16)
model = model.to("cuda:1")
tokenizer = AutoProcessor.from_pretrained(base_model)
user_prompt = "hello how are you?"
messages = [
{"role": "user", "content": user_prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=50, do_sample=False)
decoded_output = tokenizer.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]
print(decoded_output)
Expected behavior
Clean output with BPE markers handled properly
System Info
Output:
Environments:
Python 3.12.7
transformers 5.0.0.dev0 (installed from main branch)
torch: 2.9.0
mistral_common: 1.8.6
The same code with MinistralCommonBackend loaded tokenizer works:
Code:
Output:
Who can help?
No response
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Clean output with BPE markers handled properly