Skip to content

Conversation

@kaixuanliu
Copy link
Contributor

@kaixuanliu kaixuanliu commented Oct 14, 2025

@MekkCyber @drbh pls help review, thx!
We did a benchmark using following scripts on Intel XPU PVC Max 1550:

import gc
import torch
import time
import transformers
from transformers import AutoConfig, AutoProcessor, GenerationConfig, set_seed, StaticCache

device = torch.device("xpu")
data_type = torch.float16
model_id = "Qwen/Qwen3-4B-Instruct-2507"
batch_size = 8
max_input_length = 512
max_completion_length = 2048

set_seed(42)
prompt = "SUBREDDIT: r/relationships\n\nTITLE: I (f/22) have to figure out if I want to still know these girls or not and would hate to sound insulting\n\nPOST: Not sure if this belongs here but it's worth a try. \n\nBackstory:\nWhen I (f/22) went through my first real breakup 2 years ago because he needed space after a year of dating roand  it effected me more than I thought. It was a horrible time in my life due to living with my mother and finally having the chance to cut her out of my life. I can admit because of it was an emotional wreck and this guy was stable and didn't know how to deal with me. We ended by him avoiding for a month or so after going to a festival with my friends. When I think back I wish he just ended. So after he ended it added my depression I suffered but my friends helped me through it and I got rid of everything from him along with cutting contact. \n\nNow: Its been almost 3 years now and I've gotten better after counselling and mild anti depressants. My mother has been out of my life since then so there's been alot of progress. Being stronger after learning some lessons there been more insight about that time of my life but when I see him or a picture everything comes back. The emotions and memories bring me back down. \n\nHis friends (both girls) are on my facebook because we get along well which is hard to find and I know they'll always have his back. But seeing him in a picture or talking to him at a convention having a conversation is tough. Crying confront of my current boyfriend is something I want to avoid. \n\nSo I've been thinking that I have to cut contact with these girls because it's time to move on because it's healthier. It's best to avoid him as well. But will they be insulted? Will they accept it? Is there going to be awkwardness? I'm not sure if it's the right to do and could use some outside opinions.\n\nTL;DR:"

config = AutoConfig.from_pretrained(model_id)
tokenizer = AutoProcessor.from_pretrained(model_id)
architecture = getattr(transformers, config.architectures[0])
transformers_model = architecture.from_pretrained(model_id, use_kernels=True, device_map=device).to(data_type)
generation_kwargs = {
    "max_new_tokens": max_completion_length,
    "do_sample": True,
    "pad_token_id": tokenizer.pad_token_id,
    "bos_token_id": tokenizer.bos_token_id,
    "eos_token_id": tokenizer.eos_token_id,
    "temperature": 1.0,
    "cache_implementation": "static",
}
generation_config = GenerationConfig(**generation_kwargs)

prompts = [prompt] * batch_size
inputs = tokenizer(
    text=prompts, return_tensors="pt", padding=True, padding_side="left", add_special_tokens=False
).to(device)

warmup_steps = 2
run_steps = 5
for _ in range(warmup_steps):
    with torch.no_grad():
        outputs = transformers_model.generate(**inputs, generation_config=generation_config, disable_compile=True)

for _ in range(run_steps):
    torch.xpu.synchronize()
    time_s = time.time()
    with torch.no_grad():
        outputs = transformers_model.generate(**inputs, generation_config=generation_config, disable_compile=True)
    torch.xpu.synchronize()
    time_e = time.time()
    print(f"Transformers: {time_e - time_s:.4f} seconds")
del tokenizer
del transformers_model
gc.collect()
torch.xpu.empty_cache()

result shows the avg latency drops from 108s to 77s w/ this PR.

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! thanks for adding this

@MekkCyber MekkCyber enabled auto-merge (squash) October 14, 2025 13:25
@MekkCyber MekkCyber merged commit fd787c5 into huggingface:main Oct 14, 2025
25 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber added a commit that referenced this pull request Oct 14, 2025
MekkCyber added a commit that referenced this pull request Oct 14, 2025
Revert "add rmsnorm kernels support for Intel XPU (#41563)"

This reverts commit fd787c5.
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
Revert "add rmsnorm kernels support for Intel XPU (huggingface#41563)"

This reverts commit fd787c5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants