Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt #26103

hariharans29 · 2025-09-20T04:31:58Z

Description

This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback

Motivation and Context

Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255.

Credit to @zoeczy and team for this improvement and code change

…ement

…soft/onnxruntime into hari/mlas_conv_enhancement

…ement

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/test/mlas/bench/bench_sconv.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

onnxruntime/core/mlas/lib/convolve.cpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

onnxruntime/test/mlas/bench/bench_sconv.cpp

onnxruntime/core/mlas/lib/convolve.cpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

onnxruntime/core/mlas/lib/convolve.cpp

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/test/providers/cpu/nn/conv_op_test.cc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

@zoeczy

…tion opt (#26103) ### Description This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

@zoeczy

…tion opt (#26103) ### Description This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Yateng Hong <toothache9010@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com> Co-authored-by: yifei <y.zhou@xilinx.com>

apsonawane · 2025-10-21T23:43:02Z

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

@zoeczy

…ead partition opt (microsoft#26103) ### Description This is an internal branch dupe of microsoft#25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in microsoft#25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

@zoeczy

…tion opt (#26103) ### Description This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

@zoeczy

…ead partition opt (microsoft#26103) ### Description This is an internal branch dupe of microsoft#25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in microsoft#25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

a

530adc7

hariharans29 mentioned this pull request Sep 25, 2025

NEON kernels for NCHWc Convolution and Pooling #25580

Merged

hariharans29 added 6 commits September 29, 2025 17:31

Merge remote-tracking branch 'origin/main' into hari/mlas_conv_enhanc…

a88fcab

…ement

Merge remote-tracking branch 'origin/main' into hari/mlas_conv_enhanc…

be8ec77

…ement

Merge branch 'hari/mlas_conv_enhancement' of https://github.com/micro…

488820b

…soft/onnxruntime into hari/mlas_conv_enhancement

Small cosmetic fixes

7a7dfbf

Merge remote-tracking branch 'origin/main' into hari/mlas_conv_enhanc…

f9b2634

…ement

Nit

b784c3d

hariharans29 changed the title ~~[DO NOT REVIEW OR MERGE] Test PR~~ Internal Dupe of https://github.com/microsoft/onnxruntime/pull/25255/ Oct 13, 2025

hariharans29 changed the title ~~Internal Dupe of https://github.com/microsoft/onnxruntime/pull/25255/~~ Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt Oct 13, 2025

github-actions bot reviewed Oct 14, 2025

View reviewed changes

onnxruntime/test/mlas/bench/bench_sconv.cpp Outdated Show resolved Hide resolved

Update onnxruntime/test/mlas/bench/bench_sconv.cpp

1753781

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

edgchen1 reviewed Oct 14, 2025

View reviewed changes

onnxruntime/core/mlas/lib/convolve.cpp Outdated Show resolved Hide resolved

onnxruntime/core/mlas/lib/convolve.cpp Show resolved Hide resolved

onnxruntime/core/mlas/lib/convolve.cpp Outdated Show resolved Hide resolved

onnxruntime/core/mlas/lib/convolve.cpp Outdated Show resolved Hide resolved

hariharans29 and others added 2 commits October 13, 2025 20:20

Update onnxruntime/core/mlas/lib/convolve.cpp

a93c4f0

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

PR feedback

19e99b8

edgchen1 reviewed Oct 14, 2025

View reviewed changes

onnxruntime/test/mlas/bench/bench_sconv.cpp Outdated Show resolved Hide resolved

onnxruntime/core/mlas/lib/convolve.cpp Outdated Show resolved Hide resolved

onnxruntime/core/mlas/lib/convolve.cpp Outdated Show resolved Hide resolved

devang-ml added the release:1.23.2 label Oct 14, 2025

hariharans29 and others added 3 commits October 14, 2025 09:50

Update onnxruntime/core/mlas/lib/convolve.cpp

63d266b

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

Rename and file scope

ae9a6cc

Update onnxruntime/core/mlas/lib/convolve.cpp

b177afb

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

edgchen1 reviewed Oct 15, 2025

View reviewed changes

onnxruntime/core/mlas/lib/convolve.cpp Show resolved Hide resolved

Merge remote-tracking branch 'origin' into hari/mlas_conv_enhancement

4bcc5ab

edgchen1 previously approved these changes Oct 15, 2025

View reviewed changes

Add one more test

c827542

hariharans29 dismissed edgchen1’s stale review via c827542 October 16, 2025 01:56

github-actions bot reviewed Oct 16, 2025

View reviewed changes

onnxruntime/test/providers/cpu/nn/conv_op_test.cc Outdated Show resolved Hide resolved

Update onnxruntime/test/providers/cpu/nn/conv_op_test.cc

cdf4594

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

devang-ml approved these changes Oct 16, 2025

View reviewed changes

hariharans29 merged commit 992c598 into main Oct 16, 2025
92 checks passed

hariharans29 deleted the hari/mlas_conv_enhancement branch October 16, 2025 17:06

hariharans29 mentioned this pull request Oct 16, 2025

[Performance] Upstream MLAS backend optimization for better thread partitioning in multi-group or large batch convolutions #25152

Closed

apsonawane mentioned this pull request Oct 17, 2025

ORT 1.23.2 cherrypick 1 #26347

Closed

apsonawane mentioned this pull request Oct 20, 2025

ORT 1.23.2 cherrypick 1 #26368

Merged

apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt #26103

Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt #26103

Uh oh!

hariharans29 commented Sep 20, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt #26103

Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt #26103

Uh oh!

Conversation

hariharans29 commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hariharans29 commented Sep 20, 2025 •

edited

Loading