[quant][graphmode] Fold prepacked weight into module#26579
[quant][graphmode] Fold prepacked weight into module#26579jerryzh168 wants to merge 19 commits intogh/jerryzh168/83/basefrom
Conversation
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: daddc5b Pull Request resolved: #26579
ZolotukhinM
left a comment
There was a problem hiding this comment.
Overall looks good, but I have some remarks (see inline).
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: a7d9a74 Pull Request resolved: #26579
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: 0e88950 Pull Request resolved: #26579
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: c798c9f Pull Request resolved: #26579
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
|
should work now |
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
ZolotukhinM
left a comment
There was a problem hiding this comment.
How would it work in the end-to-end workflow? QuantFusion happens in the graph-executor, and this should happen before. How would it find any quantized::linear_prepack?
QuantFusion should happen before this, then maybe we should do quantfusion for linear and conv2d in advance? |
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them The pass that removes call to linear_prepack and replace it with packed params will come in next PR. Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: ghstack-source-id: 2391707 Pull Request resolved: #26579
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
ZolotukhinM
left a comment
There was a problem hiding this comment.
I think this pass should be split into two parts: 1) insert pack-unpack nodes, 2) fold pack-attrs. Probably, the first part should actually be a part of insert quant-dequant, as it's semantically tied to that pass and implementation is (almost) the same. Such separation will later help us to generalize the second step into "fold whatever computations we can into attributes that are marked as constants", To make it work, I think we'll need to perform a transformation like the following:
Original:
%y = linear(%x, %w, %b)
After insert q-dq and pack/unpack:
%x_dq = dequant(quant(%x))
%packed = prepack(%w, %b)
SetAttr["_packed"](%packed)
%packed = GetAttr["_packed"]
%w_unpacked, %b_unpacked = unpack(%packed)
%y = linear(%x_dq, %w_unpacked, %b_unpacked)
%y_dq = dequant(quant(%y))
Then folding pass (the part 2 in my terminology) will look for patterns like
%w = GetAttr["weight"]
%b = GetAttr["bias"]
%packed = prepack(%w, %b)
SetAttr["_packed"](%packed)
It will precompute attribute "_packed" and remove the prepack node from IR (disclaimer: it will probably be more complicated than just searching for a pattern because get-attr might be a different function, but we're already dealing with that in quantization). Note that later we can precompute anything that operates on constant attributes at this step, not only prepack.
And ultimately, the fusion would fuse
%x_dq = dequant(%xq)
%w_unpacked, %b_unpacked = unpack(%packed)
%y = linear(%x_dq, %w_unpacked, %b_unpacked)
%yq = quant(%y)
with a quantized linear that takes packed weight and bias.
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
|
sounds reasonable, but inserting pack/unpack will need to be a separate pass since there is no generic pack/unpack(we have linear_prepack and conv2d_prepack) while quant/dequant is generic. So it's better to keep them separate. |
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
ZolotukhinM
left a comment
There was a problem hiding this comment.
Looks good! Some nits are inline.
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397) [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397) [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397) [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397) [ghstack-poisoned]
Summary: Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Reviewers: pt1quant Subscribers: Tasks: Tags: Differential Revision: [D17636397](https://our.internmc.facebook.com/intern/diff/D17636397) [ghstack-poisoned]
|
This pull request has been merged in d91e490. |
Summary: Pull Request resolved: pytorch#26579 Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17636397 fbshipit-source-id: 3b81b6faa4413e4309453fd6acec2f0be6fd2f16
Summary: Pull Request resolved: pytorch#26579 Remove `linear_prepack` call and attach a module to the parent class that contains the packed weight and bias, this is to support serialization of the quantized model since the packed weight and bias is not serializable and we need to overwrite the `__getstate__` and `__setstate__` function to be able to serialize them Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17636397 fbshipit-source-id: 3b81b6faa4413e4309453fd6acec2f0be6fd2f16
Stack from ghstack:
Summary:
Remove
linear_prepackcall and attach a module to theparent class that contains the packed weight and bias,
this is to support serialization of the quantized model
since the packed weight and bias is not serializable and
we need to overwrite the
__getstate__and__setstate__function to be able to serialize them
Test Plan:
python test/test_jit.py
Reviewers:
pt1quant
Subscribers:
Tasks:
Tags:
Differential Revision: D17636397