Support dynamic activation quant for per-channel quantized matmul by lsy323 · Pull Request #7867 · pytorch/xla

lsy323 · 2024-08-16T02:16:07Z

Need #7863 to land first

For dynamic activation quant. the quantized mamtul will be:

The weight is quantized from w: bf16[out_dim, in_dim] to w_int: int8[out_dim, in_dim] and w_scale: bf16[out_dim]

Quantize matmul input x with shape bf16[bs, seq, in_dim] to x_int: int8[bs, seq, in_dim] and x_scale: bf16[bs,seq]
Matmul between x_int and w_int with int32 output dtype to avoid overflow `matmul(x_int, w_int) -> matmul_out: int32[bs, seq, out_dim]
Scale the matmul output with w_scaler and x_scaler: `final_out = matmul_out * w_scale * x_scale

Test
Added unit tests

JackCaoG · 2024-08-19T21:28:27Z

sorry I might not have time for this one today, will try to look into it tmr

JackCaoG · 2024-08-20T17:44:50Z

+          x, w, (([-1], [-1]), ()), preferred_element_type=torch.int32)
+    else:
+      out = F.linear(x, w)
+    out = out * scaler


so the output dtype will be int32?

yes

Matmul between x_int and w_int with int32 output dtype to avoid overflow matmul(x_int, w_int) -> matmul_out: int32[bs, seq, out_dim]`

The final output will be in bf16, since there will be scaler multiplying the int32 result

JackCaoG

lgtm, do you need to run TPU CI on this pr?

lsy323 · 2024-08-20T17:47:41Z

lgtm, do you need to run TPU CI on this pr?

Right now it's not in TPU CI, the err threshold need to be adjusted to pass on TPU. I can do that in following PR.

lsy323 force-pushed the lsiyuan/act-quant branch from 1783e1f to b05276a Compare August 16, 2024 03:44

support act quant for per-channel quantized matmul

ec98e8e

lsy323 force-pushed the lsiyuan/act-quant branch from b05276a to ec98e8e Compare August 16, 2024 22:22

lsy323 requested a review from JackCaoG August 16, 2024 22:22

lsy323 assigned miladm and lsy323 and unassigned miladm Aug 16, 2024

lsy323 requested a review from miladm August 16, 2024 22:35

lsy323 added the quantization label Aug 16, 2024

JackCaoG reviewed Aug 20, 2024

View reviewed changes

JackCaoG approved these changes Aug 20, 2024

View reviewed changes

lsy323 merged commit 4bd2df1 into master Aug 20, 2024

lsy323 deleted the lsiyuan/act-quant branch August 20, 2024 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dynamic activation quant for per-channel quantized matmul#7867

Support dynamic activation quant for per-channel quantized matmul#7867
lsy323 merged 1 commit intomasterfrom
lsiyuan/act-quant

lsy323 commented Aug 16, 2024

Uh oh!

JackCaoG commented Aug 19, 2024

Uh oh!

JackCaoG Aug 20, 2024

Uh oh!

lsy323 Aug 20, 2024

Uh oh!

lsy323 Aug 20, 2024

Uh oh!

JackCaoG left a comment

Uh oh!

lsy323 commented Aug 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lsy323 commented Aug 16, 2024

Uh oh!

JackCaoG commented Aug 19, 2024

Uh oh!

JackCaoG Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

lsy323 Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

lsy323 Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

JackCaoG left a comment

Choose a reason for hiding this comment

Uh oh!

lsy323 commented Aug 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants