Unify Quantization APIs for add, pool and relu#26335
Unify Quantization APIs for add, pool and relu#26335supriyar wants to merge 19 commits intogh/supriyar/18/basefrom
Conversation
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
| "qnnpack_maxpool(): Expected padding to be 2-dimensional: got ", | ||
| padding.size()); | ||
|
|
||
| Tensor input_contig = input.contiguous(); |
There was a problem hiding this comment.
Do you need a check for input layout here? How do you know that the input is nhwc contiguous?
There was a problem hiding this comment.
I'll make the change similar to #26242 once that lands for all qnnpack ops.
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
|
It might well be me, but on raspbian (32bit armv7 hf) and I do get some torch.uint8 tensor when I do the same with a vanilla build on x86 (i.e. with FBGEMM). |
Did you try setting torch.backends.quantized.engine = torch.qnnpack? You can set the same on server side (x86) as well to confirm qnnpack works. |
|
Indeed, this fixes the example case and seems to return something in the quantized model. Thank you! At some point, |
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
| int64_t inW = input.size(3); | ||
| // TODO: change it to contiguous(MemoryFormat::ChannelsLast) once a perf | ||
| // regression of it is fixed. | ||
| Tensor input_contig = input.permute({0, 2, 3, 1}).contiguous(); |
There was a problem hiding this comment.
@dzhulgakov please review this. I changed it to permute input tensor to NHWC similar to your changes for qconv.
There was a problem hiding this comment.
Yep, it should be fine
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
dzhulgakov
left a comment
There was a problem hiding this comment.
Looks good, I'll let Jerry to review in the details.
Would be really nice to unify the tests too and extend the benchmarks in benchmarks/op_benchmarks to go through both engines
| int64_t inW = input.size(3); | ||
| // TODO: change it to contiguous(MemoryFormat::ChannelsLast) once a perf | ||
| // regression of it is fixed. | ||
| Tensor input_contig = input.permute({0, 2, 3, 1}).contiguous(); |
There was a problem hiding this comment.
Yep, it should be fine
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D17504331](https://our.internmc.facebook.com/intern/diff/D17504331) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch/pytorch#26335 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Imported from OSS Differential Revision: D17504331 fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
|
This pull request has been merged in f337459. |
Summary: Pull Request resolved: pytorch#26335 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Imported from OSS Differential Revision: D17504331 fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
Summary: Pull Request resolved: pytorch#26335 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Imported from OSS Differential Revision: D17504331 fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
Stack from ghstack:
Summary:
Use the backend engine flag to call QNNPACK for quantized ops.
Test Plan:
python test/test_quantized.py TestQNNPACKOps
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D17504331