[pt1][quant] Add quantized::fbgemm_linear_unpack operator for serialization#20721
[pt1][quant] Add quantized::fbgemm_linear_unpack operator for serialization#20721
Conversation
Differential Revision: D15314568 Differential Version: 82426298
| at::Tensor operator()(at::Tensor packed_weight) { | ||
| // Pull out the PackBMatrix instance from the owning tensor. | ||
| auto& pack_ptr = cpp_custom_type_hack::cast<PackedFCWeight>(packed_weight); | ||
| auto packB = pack_ptr.w.get(); |
There was a problem hiding this comment.
should we check the packed weight is actually int8 here
There was a problem hiding this comment.
Do you mean that we need to add an ASSERT statement here? We can add that to check the type to make sure we have c10::qint8 instead of c10::quint8 for the packed_weight tensor.
Differential Revision: D15314568 Differential Version: 83024055
| int8_t* weight_ptr_int8 = | ||
| reinterpret_cast<int8_t*>(weight_origin.data<c10::qint8>()); | ||
|
|
||
| packB->unpack(weight_ptr_int8); |
There was a problem hiding this comment.
do you have to unpack weight if we store weight in int8?
There was a problem hiding this comment.
Yes: pack means the memory layout has been changed. We have to recover the original memory layout from the packed buffers.
| // We make a strong guarantee that models using these operators will have | ||
| // the same numerics across different machines. Therefore, we do not provide | ||
| // a fallback path and rather fail loudly if we cannot run FBGEMM. | ||
| AT_ASSERTM( |
There was a problem hiding this comment.
nit: TORCH_INTERNAL_ASSERT
test/test_quantized.py
Outdated
| ).astype(np.int8) | ||
|
|
||
| W = torch.from_numpy(_dequantize(W_q0, W_scale, W_zp)).to(dtype=torch.float) | ||
| W_q = W.quantize_linear(scale=W_scale, zero_point=W_zp, dtype=torch.qint8) |
There was a problem hiding this comment.
could you use torch.quantize_linear(...) here
jerryzh168
left a comment
There was a problem hiding this comment.
lgtm, please address comments before land.
Differential Revision: D15314568 Differential Version: 84027566
…ch#97) Summary: Pull Request resolved: pytorch#97 Pull Request resolved: pytorch/pytorch#20721 - FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix). - PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization. Reviewed By: zafartahirov Differential Revision: D15314568 fbshipit-source-id: 506506df13457ce1fe6c487bc3c0eae6972bc54a
Differential Revision: D15314568 Differential Version: 84127135
Differential Revision: D15314568 Differential Version: 84137538
…ch#97) Summary: Pull Request resolved: pytorch/FBGEMM#97 Pull Request resolved: pytorch#20721 - FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix). - PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization. Reviewed By: zafartahirov Differential Revision: D15314568 fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
Summary: Pull Request resolved: pytorch/FBGEMM#97 Pull Request resolved: pytorch/pytorch#20721 - FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix). - PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization. Reviewed By: zafartahirov Differential Revision: D15314568 fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
|
This diff broke CUDA builds: |
|
Thanks @ezyang for pointing this out! @bddppq fixed this with #21328. As pointed out by @bddppq, this Diff has Pytorch part and FBGEMM part (https://github.com/pytorch/fbgemm) and the Pytorch part depends on a new api added in the FBGEMM part. In such cases the two parts should be split into two diffs, first land the fbgemm diff first, and then land the pytorch part together with a submodule update. I will pay attention to this next time. |
Summary: Pull Request resolved: #97 Pull Request resolved: pytorch/pytorch#20721 - FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix). - PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization. Reviewed By: zafartahirov Differential Revision: D15314568 fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
Stack:
:black_circle: #20721 [pt1][quant] Add quantized::fbgemm_linear_unpack operator for serialization 💚
Pull Request resolved: pytorch/FBGEMM#97
Differential Revision: D15314568