Skip to content

[quant][graphmode] Fold prepacked weight into module#26579

Closed
jerryzh168 wants to merge 19 commits intogh/jerryzh168/83/basefrom
gh/jerryzh168/83/head
Closed

[quant][graphmode] Fold prepacked weight into module#26579
jerryzh168 wants to merge 19 commits intogh/jerryzh168/83/basefrom
gh/jerryzh168/83/head

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Sep 20, 2019

Stack from ghstack:

Summary:
Remove linear_prepack call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the __getstate__ and __setstate__
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

Differential Revision: D17636397

Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@jerryzh168 jerryzh168 requested a review from apaszke as a code owner September 20, 2019 23:56
@pytorchbot pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 20, 2019
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 21, 2019
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

ghstack-source-id: daddc5b
Pull Request resolved: #26579
Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but I have some remarks (see inline).

Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 24, 2019
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

ghstack-source-id: a7d9a74
Pull Request resolved: #26579
@jerryzh168 jerryzh168 added this to the 1.3 milestone Sep 24, 2019
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 24, 2019
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

ghstack-source-id: 0e88950
Pull Request resolved: #26579
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorchbot pytorchbot added the module: pybind Related to our Python bindings / interactions with other Python libraries label Sep 25, 2019
jerryzh168 added a commit that referenced this pull request Sep 25, 2019
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

ghstack-source-id: c798c9f
Pull Request resolved: #26579
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

should work now

Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would it work in the end-to-end workflow? QuantFusion happens in the graph-executor, and this should happen before. How would it find any quantized::linear_prepack?

@jerryzh168
Copy link
Contributor Author

How would it work in the end-to-end workflow? QuantFusion happens in the graph-executor, and this should happen before. How would it find any quantized::linear_prepack?

QuantFusion should happen before this, then maybe we should do quantfusion for linear and conv2d in advance?

Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 25, 2019
Summary:
Attach a module to the parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

The pass that removes call to linear_prepack and replace it with packed params will
come in next PR.

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

ghstack-source-id: 2391707
Pull Request resolved: #26579
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pass should be split into two parts: 1) insert pack-unpack nodes, 2) fold pack-attrs. Probably, the first part should actually be a part of insert quant-dequant, as it's semantically tied to that pass and implementation is (almost) the same. Such separation will later help us to generalize the second step into "fold whatever computations we can into attributes that are marked as constants", To make it work, I think we'll need to perform a transformation like the following:

Original:

%y = linear(%x, %w, %b)

After insert q-dq and pack/unpack:

%x_dq = dequant(quant(%x))
%packed = prepack(%w, %b)
SetAttr["_packed"](%packed)
%packed = GetAttr["_packed"]
%w_unpacked, %b_unpacked = unpack(%packed)
%y = linear(%x_dq, %w_unpacked, %b_unpacked)
%y_dq = dequant(quant(%y))

Then folding pass (the part 2 in my terminology) will look for patterns like

%w = GetAttr["weight"]
%b = GetAttr["bias"]
%packed = prepack(%w, %b)
SetAttr["_packed"](%packed)

It will precompute attribute "_packed" and remove the prepack node from IR (disclaimer: it will probably be more complicated than just searching for a pattern because get-attr might be a different function, but we're already dealing with that in quantization). Note that later we can precompute anything that operates on constant attributes at this step, not only prepack.

And ultimately, the fusion would fuse

%x_dq = dequant(%xq)
%w_unpacked, %b_unpacked = unpack(%packed)
%y = linear(%x_dq, %w_unpacked, %b_unpacked)
%yq = quant(%y)

with a quantized linear that takes packed weight and bias.

Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

jerryzh168 commented Sep 27, 2019

sounds reasonable, but inserting pack/unpack will need to be a separate pass since there is no generic pack/unpack(we have linear_prepack and conv2d_prepack) while quant/dequant is generic. So it's better to keep them separate.

Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some nits are inline.

Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397)

[ghstack-poisoned]
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397)

[ghstack-poisoned]
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397)

[ghstack-poisoned]
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397)

[ghstack-poisoned]
Summary:
Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Reviewers:
pt1quant

Subscribers:

Tasks:

Tags:

Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in d91e490.

@facebook-github-bot facebook-github-bot deleted the gh/jerryzh168/83/head branch October 28, 2019 22:15
pdlive215 pushed a commit to pdlive215/pytorch that referenced this pull request Nov 27, 2019
Summary:
Pull Request resolved: pytorch#26579

Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D17636397

fbshipit-source-id: 3b81b6faa4413e4309453fd6acec2f0be6fd2f16
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#26579

Remove `linear_prepack` call and attach a module to the
parent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the `__getstate__` and `__setstate__`
function to be able to serialize them

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D17636397

fbshipit-source-id: 3b81b6faa4413e4309453fd6acec2f0be6fd2f16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: pybind Related to our Python bindings / interactions with other Python libraries oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants