Skip to content

[pt1][quant] Add quantized::fbgemm_linear_unpack operator for serialization#20721

Closed
jianyuh wants to merge 5 commits intomasterfrom
export-D15314568
Closed

[pt1][quant] Add quantized::fbgemm_linear_unpack operator for serialization#20721
jianyuh wants to merge 5 commits intomasterfrom
export-D15314568

Conversation

@jianyuh
Copy link
Member

@jianyuh jianyuh commented May 20, 2019

Stack:
    :black_circle:  #20721 [pt1][quant] Add quantized::fbgemm_linear_unpack operator for serialization  💚

Pull Request resolved: pytorch/FBGEMM#97

  • FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
  • PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Differential Revision: D15314568

Differential Revision: D15314568
Differential Version: 82426298
@pytorchbot pytorchbot added module: operators oncall: quantization Quantization support in PyTorch labels May 20, 2019
at::Tensor operator()(at::Tensor packed_weight) {
// Pull out the PackBMatrix instance from the owning tensor.
auto& pack_ptr = cpp_custom_type_hack::cast<PackedFCWeight>(packed_weight);
auto packB = pack_ptr.w.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check the packed weight is actually int8 here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we need to add an ASSERT statement here? We can add that to check the type to make sure we have c10::qint8 instead of c10::quint8 for the packed_weight tensor.

Differential Revision: D15314568
Differential Version: 83024055
int8_t* weight_ptr_int8 =
reinterpret_cast<int8_t*>(weight_origin.data<c10::qint8>());

packB->unpack(weight_ptr_int8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have to unpack weight if we store weight in int8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: pack means the memory layout has been changed. We have to recover the original memory layout from the packed buffers.

// We make a strong guarantee that models using these operators will have
// the same numerics across different machines. Therefore, we do not provide
// a fallback path and rather fail loudly if we cannot run FBGEMM.
AT_ASSERTM(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TORCH_INTERNAL_ASSERT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

).astype(np.int8)

W = torch.from_numpy(_dequantize(W_q0, W_scale, W_zp)).to(dtype=torch.float)
W_q = W.quantize_linear(scale=W_scale, zero_point=W_zp, dtype=torch.qint8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use torch.quantize_linear(...) here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please address comments before land.

Differential Revision: D15314568
Differential Version: 84027566
jianyuh added a commit to jianyuh/FBGEMM that referenced this pull request Jun 3, 2019
…ch#97)

Summary:
Pull Request resolved: pytorch#97

Pull Request resolved: pytorch/pytorch#20721

- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Reviewed By: zafartahirov

Differential Revision: D15314568

fbshipit-source-id: 506506df13457ce1fe6c487bc3c0eae6972bc54a
jianyuh added 2 commits June 3, 2019 14:22
Differential Revision: D15314568
Differential Version: 84127135
Differential Revision: D15314568
Differential Version: 84137538
xiaomengy pushed a commit to xiaomengy/pytorch that referenced this pull request Jun 4, 2019
…ch#97)

Summary:
Pull Request resolved: pytorch/FBGEMM#97

Pull Request resolved: pytorch#20721

- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Reviewed By: zafartahirov

Differential Revision: D15314568

fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 4, 2019
Summary:
Pull Request resolved: pytorch/FBGEMM#97

Pull Request resolved: pytorch/pytorch#20721

- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Reviewed By: zafartahirov

Differential Revision: D15314568

fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
@ezyang
Copy link
Contributor

ezyang commented Jun 4, 2019

This diff broke CUDA builds:

Jun 04 03:46:33 [ 51%] Building CXX object caffe2/CMakeFiles/caffe2.dir/__/aten/src/ATen/CPUType.cpp.o
Jun 04 03:46:33 /var/lib/jenkins/workspace/aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp: In member function 'at::Tensor at::native::{anonymous}::QLinearUnpackWeightInt8::operator()(at::Tensor)':
Jun 04 03:46:33 /var/lib/jenkins/workspace/aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp:37:12: error: 'class fbgemm::PackBMatrix<signed char>' has no member named 'unpack'
Jun 04 03:46:33      packB->unpack(weight_ptr_int8);
Jun 04 03:46:33             ^

@jianyuh
Copy link
Member Author

jianyuh commented Jun 4, 2019

Thanks @ezyang for pointing this out! @bddppq fixed this with #21328. As pointed out by @bddppq, this Diff has Pytorch part and FBGEMM part (https://github.com/pytorch/fbgemm) and the Pytorch part depends on a new api added in the FBGEMM part. In such cases the two parts should be split into two diffs, first land the fbgemm diff first, and then land the pytorch part together with a submodule update. I will pay attention to this next time.

@ezyang ezyang deleted the export-D15314568 branch July 19, 2019 15:48
pytorch-bot bot pushed a commit to pytorch/FBGEMM that referenced this pull request Feb 26, 2026
Summary:
Pull Request resolved: #97

Pull Request resolved: pytorch/pytorch#20721

- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Reviewed By: zafartahirov

Differential Revision: D15314568

fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: quantization Quantization support in PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants