Adds Q/DQ layout support for embedding quantization with IntxWeightOnlyConfig#1972
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1972
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 186f903 with merge base 5ded23c ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
2c3b9ac to
05eec5d
Compare
|
|
||
| @dataclass | ||
| class IntxWeightOnlyConfig(AOBaseConfig): | ||
| """ |
There was a problem hiding this comment.
@andrewor14 can you have a look at this comment if there are any issues with it working well with QAT workflow with FakeQuantizeConfig.
There was a problem hiding this comment.
Strange that we have IntxWeightOnly and Int4WeightOnly
There was a problem hiding this comment.
yeah I feel we should probably merge these two
| Int8DynamicActivationIntxWeightConfig( | ||
| weight_dtype=weight_dtype, | ||
| granularity=granularity, | ||
| granularity=PerRow(), |
There was a problem hiding this comment.
should this be PerAxis as well
There was a problem hiding this comment.
It can't be because that's controlled by Int8DynamicActivationIntxWeightConfig, which uses PerRow until #1968 lands
| SharedEmbeddingQuantizer( | ||
| weight_dtype=weight_dtype, | ||
| granularity=granularity, | ||
| granularity=PerRow(), |
jerryzh168
left a comment
There was a problem hiding this comment.
looks good overall, just need to change PerRow to PerAxis(axis=0) as we discussed in meeting
…lyConfig (#1972) * up * up * up * up * up * up * up * up
| def _(func, types, args, kwargs): | ||
| if _embedding_q_dq_check(args, kwargs): | ||
| return _embedding_q_dq_impl(args, kwargs) | ||
|
|
There was a problem hiding this comment.
why does line 299 only dequantizes weight bu tnot actually run embedding op?
…lyConfig (#1972) * up * up * up * up * up * up * up * up
This will be used to quantize embeddings in ET.