[quant][graphmode] Fold prepacked weight into module by jerryzh168 · Pull Request #26579 · pytorch/pytorch

jerryzh168 · 2019-09-20T23:56:19Z

Stack from ghstack:

[quant][graphmode] Integrate prepacked workaround in QuantFusion #26939 [quant][graphmode] Integrate prepacked workaround in QuantFusion
[quant][graphmode] Fold prepacked weight into module #26579 [quant][graphmode] Fold prepacked weight into module

Summary:
Remove linear_prepack call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the __getstate__ and __setstate__
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

Differential Revision: D17636397

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: daddc5b Pull Request resolved: #26579

ZolotukhinM

Overall looks good, but I have some remarks (see inline).

test/test_jit.py

torch/csrc/jit/passes/quantization.cpp

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: a7d9a74 Pull Request resolved: #26579

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: 0e88950 Pull Request resolved: #26579

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: c798c9f Pull Request resolved: #26579

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

jerryzh168 · 2019-09-25T18:01:19Z

should work now

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

ZolotukhinM

How would it work in the end-to-end workflow? QuantFusion happens in the graph-executor, and this should happen before. How would it find any quantized::linear_prepack?

torch/csrc/jit/passes/quantization.cpp

jerryzh168 · 2019-09-25T20:03:44Z

How would it work in the end-to-end workflow? QuantFusion happens in the graph-executor, and this should happen before. How would it find any quantized::linear_prepack?

QuantFusion should happen before this, then maybe we should do quantfusion for linear and conv2d in advance?

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them The pass that removes call to linear_prepack and replace it with packed params will come in next PR. Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: 2391707 Pull Request resolved: #26579

torch/csrc/jit/passes/quantization.cpp

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

ZolotukhinM

I think this pass should be split into two parts: 1) insert pack-unpack nodes, 2) fold pack-attrs. Probably, the first part should actually be a part of insert quant-dequant, as it's semantically tied to that pass and implementation is (almost) the same. Such separation will later help us to generalize the second step into "fold whatever computations we can into attributes that are marked as constants", To make it work, I think we'll need to perform a transformation like the following:

Original:

%y = linear(%x, %w, %b)

After insert q-dq and pack/unpack:

%x_dq = dequant(quant(%x))
%packed = prepack(%w, %b)
SetAttr["_packed"](%packed)
%packed = GetAttr["_packed"]
%w_unpacked, %b_unpacked = unpack(%packed)
%y = linear(%x_dq, %w_unpacked, %b_unpacked)
%y_dq = dequant(quant(%y))

Then folding pass (the part 2 in my terminology) will look for patterns like

%w = GetAttr["weight"]
%b = GetAttr["bias"]
%packed = prepack(%w, %b)
SetAttr["_packed"](%packed)

It will precompute attribute "_packed" and remove the prepack node from IR (disclaimer: it will probably be more complicated than just searching for a pattern because get-attr might be a different function, but we're already dealing with that in quantization). Note that later we can precompute anything that operates on constant attributes at this step, not only prepack.

And ultimately, the fusion would fuse

%x_dq = dequant(%xq)
%w_unpacked, %b_unpacked = unpack(%packed)
%y = linear(%x_dq, %w_unpacked, %b_unpacked)
%yq = quant(%y)

with a quantized linear that takes packed weight and bias.

torch/csrc/jit/passes/quantization.cpp

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

jerryzh168 · 2019-09-27T04:09:15Z

sounds reasonable, but inserting pack/unpack will need to be a separate pass since there is no generic pack/unpack(we have linear_prepack and conv2d_prepack) while quant/dequant is generic. So it's better to keep them separate.

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

ZolotukhinM

Looks good! Some nits are inline.

torch/csrc/jit/passes/quantization.cpp

torch/csrc/jit/passes/quantization.h

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397) [ghstack-poisoned]

facebook-github-bot · 2019-09-30T19:05:24Z

This pull request has been merged in d91e490.

Summary: Pull Request resolved: pytorch#26579 Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17636397 fbshipit-source-id: 3b81b6faa4413e4309453fd6acec2f0be6fd2f16

jerryzh168 requested a review from apaszke as a code owner September 20, 2019 23:56

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 20, 2019

jerryzh168 requested review from ZolotukhinM and dzhulgakov September 23, 2019 20:01

ZolotukhinM suggested changes Sep 24, 2019

View reviewed changes

test/test_jit.py Show resolved Hide resolved

torch/csrc/jit/passes/quantization.cpp Show resolved Hide resolved

jerryzh168 added this to the 1.3 milestone Sep 24, 2019

pytorchbot added the module: pybind Related to our Python bindings / interactions with other Python libraries label Sep 25, 2019

jerryzh168 requested a review from ZolotukhinM September 25, 2019 18:01

ZolotukhinM suggested changes Sep 25, 2019

View reviewed changes

torch/csrc/jit/passes/quantization.cpp Outdated Show resolved Hide resolved

torch/csrc/jit/passes/quantization.cpp Outdated Show resolved Hide resolved

jerryzh168 mentioned this pull request Sep 25, 2019

[refactor] Move patterns in QuantFusion to a separate file #26848

Closed

dzhulgakov reviewed Sep 25, 2019

View reviewed changes

torch/csrc/jit/passes/quantization.cpp Show resolved Hide resolved

jerryzh168 mentioned this pull request Sep 26, 2019

[quant][graphmode] Integrate prepacked workaround in QuantFusion #26939

Closed

jerryzh168 requested a review from ZolotukhinM September 26, 2019 23:34

ZolotukhinM suggested changes Sep 27, 2019

View reviewed changes

jerryzh168 mentioned this pull request Sep 27, 2019

[quant][graphmode] Add InsertPackUnpack pass #26959

Closed

jerryzh168 requested a review from ZolotukhinM September 27, 2019 06:29

ZolotukhinM approved these changes Sep 27, 2019

View reviewed changes

torch/csrc/jit/passes/quantization.cpp Show resolved Hide resolved

torch/csrc/jit/passes/quantization.h Outdated Show resolved Hide resolved

jerryzh168 added 7 commits September 27, 2019 13:38

facebook-github-bot closed this in d91e490 Sep 30, 2019

facebook-github-bot added the merged label Sep 30, 2019

facebook-github-bot deleted the gh/jerryzh168/83/head branch October 28, 2019 22:15

mruberry added the Merged label Oct 28, 2020

Conversation

jerryzh168 commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZolotukhinM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Sep 25, 2019

Uh oh!

ZolotukhinM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Sep 25, 2019

Uh oh!

Uh oh!

ZolotukhinM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Sep 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZolotukhinM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Sep 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jerryzh168 commented Sep 20, 2019 •

edited

Loading

jerryzh168 commented Sep 27, 2019 •

edited

Loading