Skip to content

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Apr 23, 2025

Fixed the bug in #24228 which causes the incorrect result for phi models when flash attention is disabled.

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Apr 23, 2025
@qjia7 qjia7 marked this pull request as ready for review April 24, 2025 01:29
@qjia7 qjia7 requested review from fs-eire and guschmue April 24, 2025 01:34
@guschmue
Copy link
Contributor

lgtm.

@qjia7 qjia7 merged commit 5c014e2 into main Apr 25, 2025
87 of 89 checks passed
@qjia7 qjia7 deleted the fix_bug_in_1d_dispatch branch April 25, 2025 05:22
vraspar pushed a commit that referenced this pull request Apr 28, 2025
Fixed the bug in #24228 which causes the incorrect result for phi models
when flash attention is disabled.
jywu-msft pushed a commit that referenced this pull request Apr 30, 2025
### Description

Cherry pick the following into
[rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0)


- (#24487)
- (#24466)
- (#24493)
- (#24484)
- (#24494)
- (#24489)
- (#24504)
- (#24510)
- (#24456)
- (#24537)
- (#24501)
- (#24519)
- (#24513)
- (#24539)
- (#24514)
- (#24542)
- (#24585)

Not added:

Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing
cuda pipeline is ready
- (#24491)
- (#24509)
- (#24564)

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: minfhong-quic <quic_minfhong@quicinc.com>
Co-authored-by: minfhong-quic <minfhong-quic@quicinc.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Prathik Rao <prathik.rao@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Ankan Banerjee <ankan.ban@gmail.com>
Co-authored-by: Maximilian Müller <maximilianm@nvidia.com>
Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
Co-authored-by: iraut <iraut@nvidia.com>
Co-authored-by: Hrishikesh Manohar <hrishikeshm@nvidia.com>
Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: xhcao <xinghua.cao@intel.com>
jatinwadhwa921 pushed a commit to intel/onnxruntime that referenced this pull request Apr 30, 2025
### Description

Cherry pick the following into
[rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0)


- (microsoft#24487)
- (microsoft#24466)
- (microsoft#24493)
- (microsoft#24484)
- (microsoft#24494)
- (microsoft#24489)
- (microsoft#24504)
- (microsoft#24510)
- (microsoft#24456)
- (microsoft#24537)
- (microsoft#24501)
- (microsoft#24519)
- (microsoft#24513)
- (microsoft#24539)
- (microsoft#24514)
- (microsoft#24542)
- (microsoft#24585)

Not added:

Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing
cuda pipeline is ready
- (microsoft#24491)
- (microsoft#24509)
- (microsoft#24564)

---------

Co-authored-by: vraspar <vrajang@outlook.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: minfhong-quic <quic_minfhong@quicinc.com>
Co-authored-by: minfhong-quic <minfhong-quic@quicinc.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Prathik Rao <prathik.rao@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Ankan Banerjee <ankan.ban@gmail.com>
Co-authored-by: Maximilian Müller <maximilianm@nvidia.com>
Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
Co-authored-by: iraut <iraut@nvidia.com>
Co-authored-by: Hrishikesh Manohar <hrishikeshm@nvidia.com>
Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: xhcao <xinghua.cao@intel.com>
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request May 12, 2025
Fixed the bug in microsoft#24228 which causes the incorrect result for phi models
when flash attention is disabled.
@snnn
Copy link
Contributor

snnn commented Sep 5, 2025

This PR has been included in the rel-1.22.0 branch. Removing the release:1.22.0 label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants