Asymmetric quantized matmul support by lsy323 · Pull Request #7626 · pytorch/xla

lsy323 · 2024-07-03T21:01:33Z

This PR depends on #7605 to land first

With asymmetric quantization, w_dq = w_int * weight_scaler - zero_point.

Thus the matmul becomes
mamtul_out = x @ w_int * weight_scaler - x @ zero_point.unsqueeze(0).broadcast(x.shape[-1])

To compute the item x @ zero_point.unsqueeze(0).broadcast(x.shape[-1]), we use einsum('...c, z', x, zero_point) for per-channel quant, and matmul(x.sum(-1), zero_point) for blockwise quant.

update test update readme fix test add ssymmetric quant op support

lsy323 added the quantization label Jul 8, 2024

add blockwise quant

69dc1e8

update test update readme fix test add ssymmetric quant op support

lsy323 force-pushed the lsiyuan/asymmetric-quant branch from c5e9081 to 69dc1e8 Compare July 9, 2024 18:20

fix docstr

842d927

lsy323 marked this pull request as ready for review July 9, 2024 18:22

lsy323 requested review from JackCaoG and miladm July 9, 2024 18:22

JackCaoG approved these changes Jul 9, 2024

View reviewed changes

Comment thread docs/quantized_ops.md

lsy323 merged commit 289471c into master Jul 9, 2024

lsy323 deleted the lsiyuan/asymmetric-quant branch July 9, 2024 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asymmetric quantized matmul support#7626

Asymmetric quantized matmul support#7626
lsy323 merged 2 commits intomasterfrom
lsiyuan/asymmetric-quant

lsy323 commented Jul 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lsy323 commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lsy323 commented Jul 3, 2024 •

edited

Loading