[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP by adrianlizarraga · Pull Request #22436 · microsoft/onnxruntime

adrianlizarraga · 2024-10-15T04:28:23Z

Description

Adds QNN provider option offload_graph_io_quantization to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior.

Motivation and Context

Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

…ntization to the CPU EP

…tput)

…tization to the CPU EP (#22436) ### Description Adds QNN provider option `offload_graph_io_quantization` to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior. ### Motivation and Context Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

snnn · 2025-09-05T21:26:55Z

This PR has been cherry-picked into the rel-1.20.0 branch in PR #22526. Removing the release:1.20.0 label.

adrianlizarraga added 5 commits October 14, 2024 21:04

Add option to offload graph input quantization and graph output dequa…

e7f7903

…ntization to the CPU EP

Add new qnn provider option to other tools

8e38243

Separate into two different options (one for input and another for ou…

d7150e5

…tput)

Merge branch 'main' into adrianl/qnn-offload-io-quant-dequant-to-cpu

ccc0225

Use specific ep assignment enum in unit test

776a236

adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Oct 15, 2024

Merge branch 'main' into adrianl/qnn-offload-io-quant-dequant-to-cpu

cdcfaa0

adrianlizarraga requested review from HectorSVC and jywu-msft October 15, 2024 16:29

adrianlizarraga marked this pull request as ready for review October 15, 2024 16:30

Log warning if user also disabled cpu ep fallback

db551d1

HectorSVC reviewed Oct 16, 2024

View reviewed changes

Comment thread include/onnxruntime/core/session/onnxruntime_c_api.h Outdated

adrianlizarraga added 2 commits October 15, 2024 22:19

Use a single option

0a60ef9

Merge branch 'main' into adrianl/qnn-offload-io-quant-dequant-to-cpu

3d364f3

adrianlizarraga commented Oct 16, 2024

View reviewed changes

Comment thread onnxruntime/test/providers/qnn/qnn_basic_test.cc Outdated

Update onnxruntime/test/providers/qnn/qnn_basic_test.cc

780deff

adrianlizarraga commented Oct 16, 2024

View reviewed changes

Comment thread onnxruntime/test/qnn_ctx_gen/command_args_parser.cc Outdated

Update onnxruntime/test/qnn_ctx_gen/command_args_parser.cc

aef2ae9

HectorSVC approved these changes Oct 16, 2024

View reviewed changes

jywu-msft approved these changes Oct 16, 2024

View reviewed changes

jywu-msft merged commit 84d48b6 into main Oct 16, 2024

jywu-msft deleted the adrianl/qnn-offload-io-quant-dequant-to-cpu branch October 16, 2024 22:00

sophies927 added the release:1.20.0 label Oct 17, 2024

sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024

snnn removed the release:1.20.0 label Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP#22436

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP#22436
jywu-msft merged 11 commits intomainfrom
adrianl/qnn-offload-io-quant-dequant-to-cpu

adrianlizarraga commented Oct 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

adrianlizarraga commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

adrianlizarraga commented Oct 15, 2024 •

edited

Loading