Similar to affine quantization, we can implement codebook or look up table based quantization, which is another popular type of quantization, especially for lower bits like 4 bits or below (used in https://github.com/Vahe1994/AQLM, https://arxiv.org/abs/2402.04396 etc.). We can start with post training quantization and use k-means clustering to find the codebook / lookup table. You can check out #391 for the overall structure of torchao stack. Reference code for k-means can be found here.
After this we can also add more support for the advanced algorithms mentioned above.
API
quantize_(model, codebook_weight_only(dtype=torch.uint4))
Implementation details:
- [PR1] Ops
- quantize_codebook(tensor, codebook)
- dequantize_codebook(tensor, codebook)
- [PR2] Tensor Subclass
- CodebookQuantizedTensor (similar to AffineQuantizedTensor)
- clustering algorithm can be implemented in from_float function
Needs to flesh out the details of args etc. but can be done in the PR. I'd suggest to gradually add things and gather feedback.
Code Location: add a codebook folder under https://github.com/pytorch/ao/tree/main/torchao/prototype/quantization
### Tasks
- [x] Initial support https://github.com/pytorch/ao/pull/1299/
- [ ] Add AQLM support
- [ ] Currently it's significantly slower compared to other methods, we need to speed it up: https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md#codebook-quantization
Similar to affine quantization, we can implement codebook or look up table based quantization, which is another popular type of quantization, especially for lower bits like 4 bits or below (used in https://github.com/Vahe1994/AQLM, https://arxiv.org/abs/2402.04396 etc.). We can start with post training quantization and use k-means clustering to find the codebook / lookup table. You can check out #391 for the overall structure of torchao stack. Reference code for k-means can be found here.
After this we can also add more support for the advanced algorithms mentioned above.
API
Implementation details:
Needs to flesh out the details of args etc. but can be done in the PR. I'd suggest to gradually add things and gather feedback.
Code Location: add a
codebookfolder under https://github.com/pytorch/ao/tree/main/torchao/prototype/quantization