[pt1][quant] Add QInt32 ScalarType and qint32 data type#19816
[pt1][quant] Add QInt32 ScalarType and qint32 data type#19816jerryzh168 wants to merge 24 commits intomasterfrom
Conversation
Differential Revision: D15094174 Differential Version: 80838655
Differential Revision: D15094174 Differential Version: 80839141
Differential Revision: D15094174 Differential Version: 80909406
Differential Revision: D15094174 Differential Version: 80918409
Differential Revision: D15094174 Differential Version: 80977784
Differential Revision: D15094174 Differential Version: 80996728
Differential Revision: D15094174 Differential Version: 81007920
Differential Revision: D15094174 Differential Version: 81099685
Differential Revision: D15094174 Differential Version: 81173037
|
@dzhulgakov looks like we need to define the template function in all the callsite? |
Differential Revision: D15094174 Differential Version: 81239113
Differential Revision: D15094174 Differential Version: 81253515
Differential Revision: D15094174 Differential Version: 81263921
Differential Revision: D15094174 Differential Version: 81319070
Differential Revision: D15094174 Differential Version: 81328044
|
looks like the CI is passing, @dzhulgakov @gchanan @raghuramank100 please review again |
Differential Revision: D15094174 Differential Version: 81380836
| for (auto i = 0; i < qtensor.numel(); ++i) { | ||
| // We need to convert the qint8 value to float to ensure the subtraction | ||
| // subexpression returns a float | ||
| rd[i] = (static_cast<float>(qd[i].val_) - zero_point) * scale; |
There was a problem hiding this comment.
why force either of these to be contiguous?
There was a problem hiding this comment.
Right now I'm just assuming everything is contiguous, we can change it later if there is a need
There was a problem hiding this comment.
Not assuming, we have .contiguous call before for the input tensor
There was a problem hiding this comment.
ok, but we already have a bunch of code for copying and and for doing type conversions. Did you check what that code does?
There was a problem hiding this comment.
Do you mean TensorIterators?
There was a problem hiding this comment.
Since fbgemm implementation is contiguous only (and we expect that one to be enabled by default), I think it's ok to force contiguous
There was a problem hiding this comment.
It seems fine to make the output contiguous (although you still need to check in the ops themselves, because unlike say, MKLDNN, there is no guarantee you didn't do a view after getting the contiguous tensor).
But this forces the input to be contiguous too.
There was a problem hiding this comment.
And I still would like to understand how the existing copy/type conversion mechanisms fit into this.
For example, we almost certainly want memory_order support for quantized tensors, right? If it's possible to use the same mechanism, we should get it "for free", but in this way we certainly won't.
There was a problem hiding this comment.
I think this should be a separate discussion. Also I don't have enough context on how we should handle copy/type conversion for qtensor yet.. I do plan to implement permute and view on QTensor though.
Differential Revision: D15094174 Differential Version: 81411153
Differential Revision: D15094174 Differential Version: 81423512
Differential Revision: D15094174 Differential Version: 81476388
Differential Revision: D15094174 Differential Version: 81478101
Differential Revision: D15094174 Differential Version: 81496498
Differential Revision: D15094174 Differential Version: 81553389
Differential Revision: D15094174 Differential Version: 81642142
dzhulgakov
left a comment
There was a problem hiding this comment.
Looks good modulo a few comments
| for (auto i = 0; i < qtensor.numel(); ++i) { | ||
| // We need to convert the qint8 value to float to ensure the subtraction | ||
| // subexpression returns a float | ||
| rd[i] = (static_cast<float>(qd[i].val_) - zero_point) * scale; |
There was a problem hiding this comment.
Since fbgemm implementation is contiguous only (and we expect that one to be enabled by default), I think it's ok to force contiguous
| struct CAFFE2_API Quantizer : public c10::intrusive_ptr_target { | ||
| const QScheme qscheme_; | ||
| explicit Quantizer(QScheme qscheme) : qscheme_(qscheme) {} | ||
| const ScalarType scalar_type_; |
There was a problem hiding this comment.
why do you need to keep scalar type in quantizer if it's already present in the tensor?
There was a problem hiding this comment.
quantize function in Quantizer only takes a float Tensor as argument, so we need to have this info in Quantizer as well, scalar type is in the output Tensor not input Tensor.
Differential Revision: D15094174 Differential Version: 81670049
Differential Revision: D15094174 Differential Version: 81728373
Summary: Pull Request resolved: pytorch/pytorch#19816 We need this for quantization for bias add third argument of ScalarType to `quantize_linear` Differential Revision: D15094174 fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377
|
This pull request has been merged in abb3698. |
Stack:
:white_circle: #20107 [pt1][quant] Add dequantize_linear for JIT pass 💚
:white_circle: #19984 [pt1][quant] Add qint8 type (int8_t) 💚
:white_circle: #19932 [pt1][quant] Rename qint8 data type 💚
:black_circle: #19816 [pt1][quant] Add QInt32 ScalarType and qint32 data type 💚
We need this for quantization for bias
add third argument of ScalarType to
quantize_linearDifferential Revision: D15094174