[DML EP] Support partial rotary embedding by PatriceVignola · Pull Request #22417 · microsoft/onnxruntime

PatriceVignola · 2024-10-12T03:17:52Z

Description

This adds support for partial RotaryEmbedding to DML. Essentially, partial RotaryEmbedding simply consists of doing the rotary embedding calculation on a subregion of the input tensor of as if its head size was rotary_embedding_dim, while leaving the second part of the tensor (i.e. head_size - rotary_embedding_dim) alone.

To achieve this, all we need to do is follow the following steps:

Split the tensor into 2 parts
Run the rotary embedding algorithm on the first part, just like we were doing before on the entire tensor
Join the 2 parts back together

Since we're leaving the middle part intact, the RotaryEmbedding fusion will still be done within DML. Also, the concat at the end is essentially free because DML optimizes it out and directly allocate the result of RotaryEmbedding at the right place. The only overhead here is the splitting of the tensor at the beginning, which we should eventually make part of the RotaryEmbedding fusion within DML.

Motivation and Context

This fix allows us to correctly run models that have a partial_rotary_factor setting in huggingface, including Nvidia's Nemotron: https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct

github-actions

You can commit the suggested changes from lintrunner.

Patrice has raised PR that fixed RotaryEmbedding for partial factor in ORT so workaround is not needed anymore. microsoft/onnxruntime#22417

sumitsays · 2024-10-15T23:49:59Z

If possible can we have the ASCII DML graph. It will be helpful to understand the implementation.

Refers to: onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/DmlOperatorRotaryEmbedding.cpp:27 in 7370243. [](commit_id = 7370243, deletion_comment = False)

sumitsays

revoking review

It would change the existing style of the test that we didn't add. All we did was re-enabling a DML test.

Patrice has raised PR that fixed RotaryEmbedding for partial factor in ORT so workaround is not needed anymore. microsoft/onnxruntime#22417

### Description This adds support for partial RotaryEmbedding to DML. Essentially, partial RotaryEmbedding simply consists of doing the rotary embedding calculation on a subregion of the input tensor of as if its head size was `rotary_embedding_dim`, while leaving the second part of the tensor (i.e. `head_size - rotary_embedding_dim`) alone. To achieve this, all we need to do is follow the following steps: 1. Split the tensor into 2 parts 2. Run the rotary embedding algorithm on the first part, just like we were doing before on the entire tensor 3. Join the 2 parts back together Since we're leaving the middle part intact, the RotaryEmbedding fusion will still be done within DML. Also, the concat at the end is essentially free because DML optimizes it out and directly allocate the result of RotaryEmbedding at the right place. The only overhead here is the splitting of the tensor at the beginning, which we should eventually make part of the RotaryEmbedding fusion within DML. ### Motivation and Context This fix allows us to correctly run models that have a `partial_rotary_factor` setting in huggingface, including Nvidia's Nemotron: https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct

snnn · 2025-09-05T21:26:51Z

This PR has been cherry-picked into the rel-1.20.0 branch in PR #22526. Removing the release:1.20.0 label.

Support partial rotary embedding

9a43d42

PatriceVignola requested review from fdwr and sumitsays October 12, 2024 03:17

github-actions Bot previously requested changes Oct 12, 2024

View reviewed changes

Comment thread onnxruntime/test/contrib_ops/rotary_embedding_op_test.cc

Comment thread onnxruntime/test/contrib_ops/rotary_embedding_op_test.cc

Lint

7370243

BLSharda mentioned this pull request Oct 12, 2024

Initial support of Model - Nemotron-Mini-4B-Instruct microsoft/onnxruntime-genai#958

Merged

PatriceVignola added the release:1.20.0 label Oct 15, 2024

sumitsays reviewed Oct 15, 2024

View reviewed changes

Comment thread ...runtime/core/providers/dml/DmlExecutionProvider/src/Operators/DmlOperatorRotaryEmbedding.cpp

sumitsays previously approved these changes Oct 15, 2024

View reviewed changes

Add ascii graph

66e5c44

PatriceVignola requested a review from sumitsays October 16, 2024 05:06

sumitsays reviewed Oct 16, 2024

View reviewed changes

Comment thread ...runtime/core/providers/dml/DmlExecutionProvider/src/Operators/DmlOperatorRotaryEmbedding.cpp

sumitsays approved these changes Oct 16, 2024

View reviewed changes

PatriceVignola merged commit f610605 into main Oct 16, 2024

PatriceVignola deleted the user/pavignol/support-partial-rotary-embedding branch October 16, 2024 20:28

sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024

snnn removed the release:1.20.0 label Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DML EP] Support partial rotary embedding#22417

[DML EP] Support partial rotary embedding#22417
PatriceVignola merged 3 commits intomainfrom
user/pavignol/support-partial-rotary-embedding

PatriceVignola commented Oct 12, 2024 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sumitsays commented Oct 15, 2024

Uh oh!

sumitsays left a comment

Uh oh!

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

PatriceVignola commented Oct 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sumitsays commented Oct 15, 2024

Uh oh!

sumitsays left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PatriceVignola commented Oct 12, 2024 •

edited

Loading