Skip to content

Add int8 per channel weight-only quantized matmul#7201

Merged
lsy323 merged 6 commits intomasterfrom
lsiyuan/quant-ops
Jun 7, 2024
Merged

Add int8 per channel weight-only quantized matmul#7201
lsy323 merged 6 commits intomasterfrom
lsiyuan/quant-ops

Conversation

@lsy323
Copy link
Copy Markdown
Collaborator

@lsy323 lsy323 commented Jun 5, 2024

Add the first xla quantized ops for per-channel weight-only quantized matmul.

The math is out[bf16] = matmul(act[bf16], weight[s8]) * scaler[bf16], the same as what was adopted in the XLA llama quant implementation

User experience:

  • Call quantized op with already quantized weight in model code
  • Swap the nn.Linear Module with the added quantized module in model code

More details about user experience can be found in the added README.

Changes:

  • Added custom torch op and nn.Module for the quantized op
  • Added user guide

Test:

  • Test the lowered HLO is doing what we expect
  • Test it works with Dynamo
  • Test numerical correctness

int4 and blockwise quant support will be added in following PRs

@lsy323 lsy323 requested review from JackCaoG and miladm June 5, 2024 23:42
@lsy323 lsy323 marked this pull request as ready for review June 5, 2024 23:44
@lsy323 lsy323 changed the title Add int8 per channel quantized matmul Add int8 per channel weight-only quantized matmul Jun 5, 2024
@lsy323 lsy323 requested a review from qihqi June 6, 2024 00:08
Comment thread docs/quantized_ops.md Outdated
Comment thread docs/quantized_ops.md
Comment thread torch_xla/experimental/xla_quantized_matmul.py
Copy link
Copy Markdown
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm, minor nits

@lsy323 lsy323 requested a review from JackCaoG June 6, 2024 16:54
@lsy323 lsy323 merged commit 56ddd5d into master Jun 7, 2024
@lsy323 lsy323 deleted the lsiyuan/quant-ops branch June 7, 2024 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants