A usecase: storing a full backtracking pointer matrix can be okay for needleman/ctc alignment (4x memory saving compared to uint8 representation), if 2bit data type is used. Currently it's possible to do this with bit manipulation magic, but probably not very efficient (store and load will require masking and shifting, not fused)
Another usecase: compressed BoolTensor for binary neural networks
Another usecase: extremely low-bit quantized representations.
Is something like this already implemented for quantization? Probably a simple version of this feature could be providing some explicitly utility functions like calculating size of the holder uint8 tensor, fused store and load functions (potentially explicitly batched, e.g. actual store is delayed until some aligned number of memory lines has arrived)
In NumPy the related functionality is np.packbits and np.unpackbits, however these are designed to work only with 1-bit contained type. 2-bit/4-bit would be cool as well.
On 1-bit side, another related project is RoaringBitmap https://github.com/RoaringBitmap/RoaringBitmap (http://roaringbitmap.org/) - for compressed bitsets for set operations.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @izdeby
A usecase: storing a full backtracking pointer matrix can be okay for needleman/ctc alignment (4x memory saving compared to uint8 representation), if 2bit data type is used. Currently it's possible to do this with bit manipulation magic, but probably not very efficient (store and load will require masking and shifting, not fused)
Another usecase: compressed BoolTensor for binary neural networks
Another usecase: extremely low-bit quantized representations.
Is something like this already implemented for quantization? Probably a simple version of this feature could be providing some explicitly utility functions like calculating size of the holder
uint8tensor, fused store and load functions (potentially explicitly batched, e.g. actual store is delayed until some aligned number of memory lines has arrived)In NumPy the related functionality is
np.packbitsandnp.unpackbits, however these are designed to work only with 1-bit contained type. 2-bit/4-bit would be cool as well.On 1-bit side, another related project is RoaringBitmap https://github.com/RoaringBitmap/RoaringBitmap (http://roaringbitmap.org/) - for compressed bitsets for set operations.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @izdeby