[vLLM IR][Rope] Port RotaryEmbedding and DeepseekScalingRotaryEmbedding to IR Ops by wxsIcey · Pull Request #39488 · vllm-project/vllm

wxsIcey · 2026-04-10T08:55:59Z

Purpose

Port RotaryEmbedding and DeepseekScalingRotaryEmbedding to vLLM IR. For the base RotaryEmbedding, keep the key=None path out of the IR maybe_inplace call and fall back to the existing static implementation.

This preserves cross-layer KV sharing behavior and avoids torch.library schema inference failures on optional returns.

Test Plan

To be tested

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

wxsIcey · 2026-04-10T08:56:50Z

Some test files need to be modified.

wxsIcey · 2026-04-10T08:57:44Z

+            sin,
+            is_neox_style,
+        )
+    elif cos_sin_format == "deepseek":


I'm unsure if registering with the same IR ops is the best approach.

wxsIcey · 2026-04-10T08:59:59Z

        cos_sin_cache = self._match_cos_sin_cache_dtype(query)
-        return self.forward_static(
+        if key is None:
+            return self.forward_static(


PyTorch custom op registration does not support None (i.e., Optional[Tensor]) in return types. So when key is None, I fall it to static implement.

(EngineCore_DP0 pid=115375) File "/home/wangxiaoshuang/vllm/.venv/lib/python3.11/site-packages/torch/_library/infer_schema.py", line 71, in error_fn (EngineCore_DP0 pid=115375) raise ValueError(f"infer_schema(func): {what} Got func with signature {sig})") (EngineCore_DP0 pid=115375) ValueError: infer_schema(func): Return has unsupported type tuple[torch.Tensor, torch.Tensor | None]. The valid types are: {<class 'torch.Tensor'>: 'Tensor', typing.List[torch.Tensor]: 'Tensor[]', list[torch.Tensor]: 'Tensor[]', <class 'int'>: 'SymInt', <class 'float'>: 'float', <class 'bool'>: 'bool', int | float | bool: 'Scalar'}. Got func with signature (positions: torch.Tensor, query: torch.Tensor, key: torch.Tensor | None, head_size: int, cos_sin_cache: torch.Tensor, is_neox_style: bool) -> tuple[torch.Tensor, torch.Tensor | None]) [rank0]:[W318 06:36:34.959129788 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

gemini-code-assist

Code Review

This pull request introduces a unified rotary_embedding IR operation to replace various ad-hoc implementations across the codebase. It refactors fusion passes and platform-specific kernels (AITER, VLLM C, XPU) to register and use this new operation, while also updating platform configurations to manage operation priorities. Feedback was provided regarding missing validation checks in the kernel implementations for XPU, AITER, and VLLM C, specifically concerning the handling of offsets and partial rotations where rotary_dim might differ from head_size.

gemini-code-assist · 2026-04-10T09:01:23Z

+    if cos_sin_format == "standard":
+        torch.ops._C.rotary_embedding(
+            positions,
+            query,
+            key,
+            head_size,
+            cos_sin_cache,
+            is_neox_style,
+        )
+        return query, key


The standard format implementation for XPU kernels does not check if offsets is None. If offsets is provided with the standard format, it will be silently ignored by the underlying torch.ops._C.rotary_embedding call, leading to incorrect results. Please add a check for offsets is None in the supports_args or within the implementation.

Suggested change

if cos_sin_format == "standard":

torch.ops._C.rotary_embedding(

positions,

query,

key,

head_size,

cos_sin_cache,

is_neox_style,

)

return query, key

if cos_sin_format == "standard":

assert offsets is None, "Standard RoPE with offsets is not supported on XPU"

torch.ops._C.rotary_embedding(

positions,

query,

key,

head_size,

cos_sin_cache,

is_neox_style,

)

return query, key

gemini-code-assist · 2026-04-10T09:01:23Z

+def rotary_no_offsets_16bit_only(
+    positions: Tensor,
+    query: Tensor,
+    key: Tensor,
+    head_size: int,
+    rotary_dim: int,
+    cos_sin_cache: Tensor,
+    is_neox_style: bool,
+    offsets: Tensor | None = None,
+    cos_sin_format: str = "standard",
+) -> bool:
+    return offsets is None and cos_sin_format == "standard"


The rotary_no_offsets_16bit_only check does not verify that rotary_dim == head_size. The rocm_aiter_triton_rotary_embedding kernel is called with head_size but without rotary_dim, which suggests it might not support partial rotation (where rotary_dim < head_size). If this is the case, the implementation should only be selected when rotary_dim == head_size to avoid incorrect results for models with partial RoPE.

Suggested change

def rotary_no_offsets_16bit_only(

positions: Tensor,

query: Tensor,

key: Tensor,

head_size: int,

rotary_dim: int,

cos_sin_cache: Tensor,

is_neox_style: bool,

offsets: Tensor | None = None,

cos_sin_format: str = "standard",

) -> bool:

return offsets is None and cos_sin_format == "standard"

def rotary_no_offsets_16bit_only(

positions: Tensor,

query: Tensor,

key: Tensor,

head_size: int,

rotary_dim: int,

cos_sin_cache: Tensor,

is_neox_style: bool,

offsets: Tensor | None = None,

cos_sin_format: str = "standard",

) -> bool:

return offsets is None and cos_sin_format == "standard" and rotary_dim == head_size

gemini-code-assist · 2026-04-10T09:01:23Z

+def rotary_no_offsets(
+    positions: Tensor,
+    query: Tensor,
+    key: Tensor,
+    head_size: int,
+    rotary_dim: int,
+    cos_sin_cache: Tensor,
+    is_neox_style: bool,
+    offsets: Tensor | None = None,
+    cos_sin_format: str = "standard",
+) -> bool:
+    return offsets is None and cos_sin_format == "standard"


The rotary_no_offsets check should verify that rotary_dim == head_size if the underlying torch.ops._C.rotary_embedding kernel does not support partial rotation. Since the kernel is called with head_size as the 4th argument and no rotary_dim is provided, it is likely to rotate the entire head. This would be incorrect for models where rotary_dim < head_size.

Suggested change

def rotary_no_offsets(

positions: Tensor,

query: Tensor,

key: Tensor,

head_size: int,

rotary_dim: int,

cos_sin_cache: Tensor,

is_neox_style: bool,

offsets: Tensor | None = None,

cos_sin_format: str = "standard",

) -> bool:

return offsets is None and cos_sin_format == "standard"

def rotary_no_offsets(

positions: Tensor,

query: Tensor,

key: Tensor,

head_size: int,

rotary_dim: int,

cos_sin_cache: Tensor,

is_neox_style: bool,

offsets: Tensor | None = None,

cos_sin_format: str = "standard",

) -> bool:

return offsets is None and cos_sin_format == "standard" and rotary_dim == head_size

mergify · 2026-04-18T01:54:17Z

Documentation preview: https://vllm--39488.org.readthedocs.build/en/39488/

mergify · 2026-04-18T01:54:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wxsIcey.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

…ng to vLLM IR Ops Signed-off-by: Icey <1790571317@qq.com>

Signed-off-by: Icey <1790571317@qq.com>

mergify Bot added deepseek Related to DeepSeek models nvidia rocm Related to AMD ROCm cpu Related to CPU backends labels Apr 10, 2026

github-project-automation Bot added this to NVIDIA and AMD Apr 10, 2026

github-project-automation Bot moved this to Todo in AMD Apr 10, 2026

wxsIcey commented Apr 10, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch from 8b5e95e to 6b2a07f Compare April 18, 2026 01:53

mergify Bot added the intel-gpu Related to Intel GPU label Apr 18, 2026

mergify Bot added the documentation Improvements or additions to documentation label Apr 18, 2026

mergify Bot added the needs-rebase label Apr 18, 2026

ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch from 6b2a07f to 7194c47 Compare April 20, 2026 20:34

Rohan138 mentioned this pull request Apr 24, 2026

[Performance] Fuse RoPE + KV cache update for MLA backends #35879

Closed

ProExpertProg added 7 commits April 27, 2026 11:58

Add maybe_inplace overload with unit tests

5008a38

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

fused_add_rms_norm IR op & all implementations

c13f3cd

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Add fused add to test_layernorm

7153849

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Functionalization, clone elimination, and tests

0ba29da

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Fix custom passes & tests

b189377

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

remove aiter rms_norms

2b2fc28

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Add doc

978ffe0

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch from 7194c47 to 978ffe0 Compare April 27, 2026 16:01

ProExpertProg and others added 3 commits April 28, 2026 01:44

batch invariant infra with tests

610884f

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

batch invariant triton impl for rms_norm

02e644b

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Extend comment vllm/ir/op.py

d38f120

Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

ProExpertProg and others added 4 commits April 28, 2026 01:58

Add maybe_inplace overload with unit tests

a33cf7f

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

fused_add_rms_norm IR op & all implementations

3c4fe62

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Functionalization, clone elimination, and tests

8d0fa8b

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

[vLLM IR][Rope] Port RotaryEmbedding and DeepseekScalingRotaryEmbeddi…

1553195

…ng to vLLM IR Ops Signed-off-by: Icey <1790571317@qq.com>

wxsIcey force-pushed the wxs/vllm-ir/rotary-embedding branch from e7aa987 to 1553195 Compare April 28, 2026 02:22

mergify Bot removed the needs-rebase label Apr 28, 2026

reslove conflict

da5045a

Signed-off-by: Icey <1790571317@qq.com>

ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch 2 times, most recently from cbdcc4f to e71bea3 Compare April 29, 2026 18:41

ProExpertProg deleted the branch vllm-project:luka/vllm-ir/rms-norm-inplace May 2, 2026 03:41

ProExpertProg closed this May 2, 2026

github-project-automation Bot moved this from Todo to Done in AMD May 2, 2026

github-project-automation Bot moved this to Done in NVIDIA May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vLLM IR][Rope] Port RotaryEmbedding and DeepseekScalingRotaryEmbedding to IR Ops#39488

[vLLM IR][Rope] Port RotaryEmbedding and DeepseekScalingRotaryEmbedding to IR Ops#39488
wxsIcey wants to merge 15 commits into
vllm-project:luka/vllm-ir/rms-norm-inplacefrom
wxsIcey:wxs/vllm-ir/rotary-embedding

wxsIcey commented Apr 10, 2026 •

edited

Loading

Uh oh!

wxsIcey commented Apr 10, 2026

Uh oh!

wxsIcey Apr 10, 2026 •

edited

Loading

Uh oh!

wxsIcey Apr 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Uh oh!

mergify Bot commented Apr 18, 2026

Uh oh!

mergify Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

wxsIcey commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

wxsIcey commented Apr 10, 2026

Uh oh!

wxsIcey Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wxsIcey Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 18, 2026

Uh oh!

mergify Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wxsIcey commented Apr 10, 2026 •

edited

Loading

wxsIcey Apr 10, 2026 •

edited

Loading