[quant][graphmode] Different rule for handling aten::cat#38570
[quant][graphmode] Different rule for handling aten::cat#38570jerryzh168 wants to merge 5 commits intogh/jerryzh168/320/basefrom
aten::cat#38570Conversation
Summary: We changed the rule of quantizing `aten::cat`, previously `aten::cat` is considered to be an op that should always be quantized, like `aten::conv2d`, but this is not ideal, a better way is to quantize the output of `aten::cat` depending on whether the input is quantized, if it is then we'll quantize the output, if not, then we will not quantize the output, since `aten::cat` works both on quantized and non-quantized tensor. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 7236764 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 16 times. |
Summary: We changed the rule of quantizing `aten::cat`, previously `aten::cat` is considered to be an op that should always be quantized, like `aten::conv2d`, but this is not ideal, a better way is to quantize the output of `aten::cat` depending on whether the input is quantized, if it is then we'll quantize the output, if not, then we will not quantize the output, since `aten::cat` works both on quantized and non-quantized tensor. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21600160](https://our.internmc.facebook.com/intern/diff/D21600160) [ghstack-poisoned]
| .run(m.graph) | ||
|
|
||
| # non quantized cat | ||
| m = torch.jit.script(NonQuantizedCat()).eval() |
There was a problem hiding this comment.
In this case, what is the expected behavior? Would you quantize all the inputs to non-quantized cat and quantize the output too? In general, dont you quantize the input to a model?
There was a problem hiding this comment.
no, we don't quantize the inputs to cat in this case, the output of cat is only quantized when all the inputs are quantized.
| !isObserved(v, block_observed_values)) { | ||
| if (auto observer_opt = getObserverFor(v)) { | ||
| auto observer_opt = getObserverFor(v); | ||
| // If the node is one of the propagate quant node, e.g. |
There was a problem hiding this comment.
Isnt this logic also true for add or conv? Why special case concat?
There was a problem hiding this comment.
This is only true for cat, for conv we'll always quantize its input and output
There was a problem hiding this comment.
We need a consistent set of rules, consider the following two models:
import torch
import torch.nn as nn
class testM(nn.Module):
def __init__(self):
super().__init__()
self.c = nn.Conv2d(3,5,1)
self.d = nn.Conv2d(3,5,1)
def forward(self, x):
# If there is a nn.Identity or shape, will the inputs be quantized?
y = self.c(x)
z = self.d(x)
w = torch.cat((y, z))
return w
# Second one:
class testM2(nn.Module):
def __init__(self):
super().__init__()
self.c = nn.Conv2d(3,5,1)
self.d = nn.Conv2d(3,5,1)
def forward(self, x):
w = torch.cat((x, x))
y = self.c(w)
z = self.d(w)
return wIn the first case, the input will be quantized. In the second case the input will not be.
What if in the first case I have an nn.Identity prior to the conv or a reshape? In that case do we quantize the inputs?
There was a problem hiding this comment.
we don't quantize identity or reshape, we also don't accept input that's already quantized outside of the model.
There was a problem hiding this comment.
In terms user interface, user will always provide a floating point Tensor, regardless of how the model is quantized
There was a problem hiding this comment.
Got it, the input is always in fp and in certain cases the input is quantized (conv) and in certain cases it is not (cat)
Summary: We changed the rule of quantizing `aten::cat`, previously `aten::cat` is considered to be an op that should always be quantized, like `aten::conv2d`, but this is not ideal, a better way is to quantize the output of `aten::cat` depending on whether the input is quantized, if it is then we'll quantize the output, if not, then we will not quantize the output, since `aten::cat` works both on quantized and non-quantized tensor. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21600160](https://our.internmc.facebook.com/intern/diff/D21600160) [ghstack-poisoned]
Summary: We changed the rule of quantizing `aten::cat`, previously `aten::cat` is considered to be an op that should always be quantized, like `aten::conv2d`, but this is not ideal, a better way is to quantize the output of `aten::cat` depending on whether the input is quantized, if it is then we'll quantize the output, if not, then we will not quantize the output, since `aten::cat` works both on quantized and non-quantized tensor. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21600160](https://our.internmc.facebook.com/intern/diff/D21600160) [ghstack-poisoned]
Summary: We changed the rule of quantizing `aten::cat`, previously `aten::cat` is considered to be an op that should always be quantized, like `aten::conv2d`, but this is not ideal, a better way is to quantize the output of `aten::cat` depending on whether the input is quantized, if it is then we'll quantize the output, if not, then we will not quantize the output, since `aten::cat` works both on quantized and non-quantized tensor. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21600160](https://our.internmc.facebook.com/intern/diff/D21600160) [ghstack-poisoned]
|
This pull request has been merged in 1ef77f9. |
…8570) Summary: Pull Request resolved: pytorch#38570 We changed the rule of quantizing `aten::cat`, previously `aten::cat` is considered to be an op that should always be quantized, like `aten::conv2d`, but this is not ideal, a better way is to quantize the output of `aten::cat` depending on whether the input is quantized, if it is then we'll quantize the output, if not, then we will not quantize the output, since `aten::cat` works both on quantized and non-quantized tensor. Test Plan: Imported from OSS Differential Revision: D21600160 fbshipit-source-id: efa957e0eaa608fffefcdfefa7f442fab45605eb
Stack from ghstack:
aten::cat#38570 [quant][graphmode] Different rule for handlingaten::catSummary:
We changed the rule of quantizing
aten::cat, previouslyaten::catis considered to bean op that should always be quantized, like
aten::conv2d, but this is not ideal, a betterway is to quantize the output of
aten::catdepending on whether the input is quantized, if it isthen we'll quantize the output, if not, then we will not quantize the output, since
aten::catworks both onquantized and non-quantized tensor.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D21600160