While looking at example 55 (cutlass/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu), I was curious whether this modification would be legal:
From:
using MmaType = cutlass::bfloat16_t; using QuantType = cutlass::int4b_t;
To:
using MmaType = cutlass::bfloat16_t; using QuantType = cutlass::int2b_t;
According to the README.md, for the example, "For 8-bit x 4-bit or 2-bit, both inputs must be K-major." However, the internal comment states, "Only supports INT4 x { FP16, BF16 }." Furthermore, I'm having trouble finding documentation in the library over the use of int2b_t datatype for use in GEMM. I apologize if this question needs to be more detailed or if I missed some part of the documentation.