Currently group size defaults to 128, but anything below that will fail with this error:
File "/home/andrewor/local/ao/torchao/quantization/quantize_/workflows/int4/int4_preshuffled_tensor.py", line 214, in _
res = torch.ops.fbgemm.f8i4bf16_shuffled(
File "/home/andrewor/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/_ops.py", line 1158, in __call__
return self._op(*args, **(kwargs or {}))
RuntimeError: cutlass cannot implement
and there are no real benefits making it bigger.
Currently group size defaults to 128, but anything below that will fail with this error:
and there are no real benefits making it bigger.