Skip to content

Add codebook (look up table based) quantization flow in torchao #1195

@jerryzh168

Description

@jerryzh168

Similar to affine quantization, we can implement codebook or look up table based quantization, which is another popular type of quantization, especially for lower bits like 4 bits or below (used in https://github.com/Vahe1994/AQLM, https://arxiv.org/abs/2402.04396 etc.). We can start with post training quantization and use k-means clustering to find the codebook / lookup table. You can check out #391 for the overall structure of torchao stack. Reference code for k-means can be found here.

After this we can also add more support for the advanced algorithms mentioned above.

API

quantize_(model, codebook_weight_only(dtype=torch.uint4))

Implementation details:

  • [PR1] Ops
    • quantize_codebook(tensor, codebook)
    • dequantize_codebook(tensor, codebook)
  • [PR2] Tensor Subclass
    • CodebookQuantizedTensor (similar to AffineQuantizedTensor)
      • clustering algorithm can be implemented in from_float function

Needs to flesh out the details of args etc. but can be done in the PR. I'd suggest to gradually add things and gather feedback.

Code Location: add a codebook folder under https://github.com/pytorch/ao/tree/main/torchao/prototype/quantization

### Tasks
- [x] Initial support https://github.com/pytorch/ao/pull/1299/
- [ ] Add AQLM support
- [ ] Currently it's significantly slower compared to other methods, we need to speed it up: https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md#codebook-quantization

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions