Skip to content

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP#22436

Merged
jywu-msft merged 11 commits intomainfrom
adrianl/qnn-offload-io-quant-dequant-to-cpu
Oct 16, 2024
Merged

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP#22436
jywu-msft merged 11 commits intomainfrom
adrianl/qnn-offload-io-quant-dequant-to-cpu

Conversation

@adrianlizarraga
Copy link
Copy Markdown
Contributor

@adrianlizarraga adrianlizarraga commented Oct 15, 2024

Description

Adds QNN provider option offload_graph_io_quantization to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior.

Motivation and Context

Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

@adrianlizarraga adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Oct 15, 2024
@adrianlizarraga adrianlizarraga marked this pull request as ready for review October 15, 2024 16:30
Comment thread include/onnxruntime/core/session/onnxruntime_c_api.h Outdated
Comment thread onnxruntime/test/providers/qnn/qnn_basic_test.cc Outdated
Comment thread onnxruntime/test/qnn_ctx_gen/command_args_parser.cc Outdated
@jywu-msft jywu-msft merged commit 84d48b6 into main Oct 16, 2024
@jywu-msft jywu-msft deleted the adrianl/qnn-offload-io-quant-dequant-to-cpu branch October 16, 2024 22:00
guschmue pushed a commit that referenced this pull request Oct 18, 2024
…tization to the CPU EP (#22436)

### Description
Adds QNN provider option `offload_graph_io_quantization` to offload
graph input quantization and graph output dequantization to the CPU EP.
Option is disabled by default to maintain current behavior.


### Motivation and Context
Offloading the handling of I/O quantization to the CPU EP significantly
improves inference latency for many models.
apsonawane pushed a commit that referenced this pull request Oct 21, 2024
…tization to the CPU EP (#22436)

### Description
Adds QNN provider option `offload_graph_io_quantization` to offload
graph input quantization and graph output dequantization to the CPU EP.
Option is disabled by default to maintain current behavior.


### Motivation and Context
Offloading the handling of I/O quantization to the CPU EP significantly
improves inference latency for many models.
@sophies927 sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024
@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 5, 2025

This PR has been cherry-picked into the rel-1.20.0 branch in PR #22526. Removing the release:1.20.0 label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch ep:QNN issues related to QNN exeution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants