Support int4 weight in quantized matmul/linear#7235

Merged

lsy323 merged 24 commits intomasterfrom

lsiyuan/int4-quant-ops

Jun 11, 2024

Collaborator

lsy323 commented Jun 10, 2024 •

edited

Loading

int4 weight can be enabled by torch.ops.xla.quantized_matmul(x, weight, weight_scaler, int4_weight=True), XlaQuantizedLinear(...,int4_weight=True)

The matmul w/ int4 workflow is:

The int4 weight is stored in int8 container (unpacked)
During HLO lowering, xla::Literal will be created for the int4 weights
F.linear on the activation and int4 weight

Original plan was to pack int4 values in int8 container, and do reinterpret cast, but reinterpret cast does't work on TPU now.

Test:
Added tests for quantized op and linear module.

Siyuan Liu and others added 16 commits

June 5, 2024 20:31


          add quantized layers per channel

f7c200a


          enhance tests, clean up

f48c666


          add q ops to ci

65f6fca


          add README

c042e2f


          update readme

878d7e7


          update readme

b454237


          initial commit for int4

e69627f


          add some tests

810f104


          use literal

b8ed810


          fix bad malloc

27acbbb


          add a subchannel test

7c52bf9


          add tests

9fd7caa


          add TPU numerical check

fa29ba2


          refactor

9c47f63


          format

256a261


          merge

059053b

lsy323 marked this pull request as ready for review

June 10, 2024 22:41

Siyuan Liu added 2 commits

June 10, 2024 22:45


          update docl

5c4c7f0


          rename to cast_int4

03f46f1

lsy323 force-pushed the lsiyuan/int4-quant-ops branch from 4330117 to 03f46f1 Compare

June 10, 2024 23:01

Siyuan Liu added 2 commits

June 10, 2024 23:03


          remove dup files

11be78b


          format

3a1d83f

JackCaoG self-requested a review

June 10, 2024 23:05

Siyuan Liu added 2 commits

June 10, 2024 23:06


          remove comment

62a0b17


          remove comment

5fe2f09

JackCaoG reviewed

View reviewed changes

test/quantized_ops/test_quantized_matmul.py Outdated

JackCaoG reviewed

View reviewed changes

test/quantized_ops/test_quantized_matmul.py Outdated

JackCaoG reviewed

View reviewed changes

test/quantized_ops/test_quantized_matmul.py

JackCaoG reviewed

View reviewed changes

test/quantized_ops/test_quantized_matmul.py

JackCaoG reviewed

View reviewed changes

test/quantized_ops/test_quantized_matmul.py Outdated

JackCaoG reviewed

View reviewed changes

torch_xla/csrc/ops/cast_int4.cpp Outdated

JackCaoG reviewed

View reviewed changes

torch_xla/csrc/ops/cast_int4.cpp

JackCaoG reviewed

View reviewed changes

torch_xla/experimental/xla_quantized_matmul.py Outdated

JackCaoG reviewed

View reviewed changes

torch_xla/experimental/xla_quantized_matmul.py Outdated

JackCaoG reviewed

View reviewed changes

torch_xla/experimental/xla_quantized_matmul.py Outdated


          remove unused pack unpack and test

9addde9

lsy323 requested a review from JackCaoG

June 10, 2024 23:17

Collaborator Author

lsy323 commented Jun 10, 2024

Removed pack/unpack logic and test since not used now.

JackCaoG approved these changes

View reviewed changes

lsy323 added the quantization label


          fix import

77c61a6

lsy323 merged commit ac371fb into master

miladm assigned miladm and lsy323 and unassigned miladm

lsy323 deleted the lsiyuan/int4-quant-ops branch

December 6, 2024 18:54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels