Skip to content

Asymmetric quantized matmul support#7626

Merged
lsy323 merged 2 commits intomasterfrom
lsiyuan/asymmetric-quant
Jul 9, 2024
Merged

Asymmetric quantized matmul support#7626
lsy323 merged 2 commits intomasterfrom
lsiyuan/asymmetric-quant

Conversation

@lsy323
Copy link
Copy Markdown
Collaborator

@lsy323 lsy323 commented Jul 3, 2024

This PR depends on #7605 to land first

With asymmetric quantization, w_dq = w_int * weight_scaler - zero_point.

Thus the matmul becomes
mamtul_out = x @ w_int * weight_scaler - x @ zero_point.unsqueeze(0).broadcast(x.shape[-1])

To compute the item x @ zero_point.unsqueeze(0).broadcast(x.shape[-1]), we use einsum('...c, z', x, zero_point) for per-channel quant, and matmul(x.sum(-1), zero_point) for blockwise quant.

update test

update readme

fix test

add ssymmetric quant op support
@lsy323 lsy323 force-pushed the lsiyuan/asymmetric-quant branch from c5e9081 to 69dc1e8 Compare July 9, 2024 18:20
@lsy323 lsy323 marked this pull request as ready for review July 9, 2024 18:22
@lsy323 lsy323 requested review from JackCaoG and miladm July 9, 2024 18:22
Comment thread docs/quantized_ops.md
@lsy323 lsy323 merged commit 289471c into master Jul 9, 2024
@lsy323 lsy323 deleted the lsiyuan/asymmetric-quant branch July 9, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants