ORT 1.20.0 Release: Cherry pick round 1 by apsonawane · Pull Request #22526 · microsoft/onnxruntime

apsonawane · 2024-10-21T20:50:07Z

ORT 1.20.0 release preparation: Cherry pick round 1

Approved cherry pick comments

- Allow specification of iOS simulator runtime version to use. - Pick simulator runtime version (iphonesimulator 16.4) that is supported by the Xcode version (14.3.1) that we use. - Disable CoreML EP's DepthToSpace op support for CoreML version less than 7, with DCR mode, and FP16 input. It doesn't produce the correct output in this case. - Some cleanup of iOS test infrastructure.

### Description Update QNN default version to 2.27 in CI pipeline

…tization to the CPU EP (#22436) ### Description Adds QNN provider option `offload_graph_io_quantization` to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior. ### Motivation and Context Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

### Description This adds support for partial RotaryEmbedding to DML. Essentially, partial RotaryEmbedding simply consists of doing the rotary embedding calculation on a subregion of the input tensor of as if its head size was `rotary_embedding_dim`, while leaving the second part of the tensor (i.e. `head_size - rotary_embedding_dim`) alone. To achieve this, all we need to do is follow the following steps: 1. Split the tensor into 2 parts 2. Run the rotary embedding algorithm on the first part, just like we were doing before on the entire tensor 3. Join the 2 parts back together Since we're leaving the middle part intact, the RotaryEmbedding fusion will still be done within DML. Also, the concat at the end is essentially free because DML optimizes it out and directly allocate the result of RotaryEmbedding at the right place. The only overhead here is the splitting of the tensor at the beginning, which we should eventually make part of the RotaryEmbedding fusion within DML. ### Motivation and Context This fix allows us to correctly run models that have a `partial_rotary_factor` setting in huggingface, including Nvidia's Nemotron: https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct

edgchen1 · 2024-10-21T22:05:46Z

do you also want to take #22508 to fix the "Big Models" pipeline?

sophies927 · 2024-10-21T22:08:50Z

do you also want to take #22508 to fix the "Big Models" pipeline?

I think that's a good idea since I see that the big models checks have already failed a couple times - @apsonawane can you please add that one?

…es in 0.26.0 (#22508) ### Description Pin huggingface_hub to 0.25.2 due to breaking changes in 0.26.0. ### Motivation and Context We depend on `diffusers==0.28.0`, which [depends on](https://github.com/huggingface/diffusers/blob/v0.28.0-release/setup.py#L104) `huggingface_hub>=0.20.2`. There are breaking changes with the latest huggingface_hub 0.26.0 release that break our Big Models pipeline: [Release v0.26.0: Multi-tokens support, conversational VLMs and quality of life improvements · huggingface/huggingface_hub](https://github.com/huggingface/huggingface_hub/releases/tag/v0.26.0) Specifically, the breaking changes to `cached_download()` cause our pipeline to fail. ![image](https://github.com/user-attachments/assets/c1d15c7e-9a5d-4ef3-8d1b-35bde0a2ca82)

snnn · 2024-10-22T00:13:49Z

Please also include this change: #22516

This pull request upgrades the CMake version from v3.31.0-rc1 to v3.31.0-rc2 to include a bug fix for CUDA https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9902 from Nvidia company. AB#51692

snnn · 2024-10-22T02:22:21Z

And you may also need #22479 to get the Windows pipelines pass. Or you may need to retry and retry.

### Description The recent PR #22223 introduced 2 bugs in implementation of CPU LayerNorm f16: - possible access to nullptr for bias `const TensorShape& bias_shape = bias->Shape();` will crash when `bias` does not exist. (amazingly seems this one is not coverred by any test case) - fix: guard with pointer check - a racing condition inside ComputeJob `ComputeJob()` is dispatched to threadpool and it internally tries to modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`, which are `std::unique_ptr`s and are not thread-safe. - fix: move the modification of `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into `LayerNormImpl::ComputeWithoutContext()`. It may still have racing condition because `ConcurrentRunSupported` is set to `true` for CPU EP. Added an OrtMutex. This should fixes the recent flaky tests as well.

snnn · 2025-09-05T20:55:25Z

This PR has been included in the rel-1.20.0 branch. Removing the release:1.20.0 label.

edgchen1 and others added 4 commits October 21, 2024 13:43

Update QNN default version to 2.27 in CI pipeline (#22471)

84aee0c

### Description Update QNN default version to 2.27 in CI pipeline

apsonawane requested review from a team, HectorSVC, PatriceVignola, adrianlizarraga and edgchen1 October 21, 2024 20:50

sophies927 requested review from jywu-msft and natke October 21, 2024 21:47

Update CMake (#22516)

c357c26

This pull request upgrades the CMake version from v3.31.0-rc1 to v3.31.0-rc2 to include a bug fix for CUDA https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9902 from Nvidia company. AB#51692

sophies927 requested a review from fs-eire October 22, 2024 17:42

snnn approved these changes Oct 22, 2024

View reviewed changes

edgchen1 approved these changes Oct 22, 2024

View reviewed changes

adrianlizarraga approved these changes Oct 22, 2024

View reviewed changes

apsonawane merged commit 2d00351 into rel-1.20.0 Oct 22, 2024

apsonawane deleted the asonawane/cherry-picks branch October 22, 2024 20:57

sophies927 added release:1.20.0 cherry-picked Cherry-picked for a cherrypicks branch labels Oct 24, 2024

snnn removed the release:1.20.0 label Sep 5, 2025

This was referenced Sep 5, 2025

[DML EP] Support partial rotary embedding #22417

Merged

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP #22436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT 1.20.0 Release: Cherry pick round 1#22526

ORT 1.20.0 Release: Cherry pick round 1#22526
apsonawane merged 7 commits intorel-1.20.0from
asonawane/cherry-picks

apsonawane commented Oct 21, 2024

Uh oh!

edgchen1 commented Oct 21, 2024

Uh oh!

sophies927 commented Oct 21, 2024

Uh oh!

snnn commented Oct 22, 2024

Uh oh!

snnn commented Oct 22, 2024

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

apsonawane commented Oct 21, 2024

Uh oh!

edgchen1 commented Oct 21, 2024

Uh oh!

sophies927 commented Oct 21, 2024

Uh oh!

snnn commented Oct 22, 2024

Uh oh!

snnn commented Oct 22, 2024

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants