Add basic support for MXFP6_MOE quantization#16777
Conversation
This happens because you have not installed/inited |
Thanks for your advice. |
|
Please restore formatting to |
|
What is the motivation for this type? Are there any models being natively distributed in MXFP6, or does it perform better than other quantizations? |
probably Blackwell support |
Currently, there are no models natively distributed in MXFP6, but I think MXFP6 may offer a good balance between model quality and performance in the future :) NVIDIA's Blackwell architecture is expected to support MXFP6, and AMD MI355X also includes MXFP6 support. Additionally, while MXFP4 has shown promising results with QAT, some paper such as Table 2 and 3 in this papers reports that MXFP4 may not perform as well under direct quantization (which is one of wide use-cases of llama.cpp). In contrast, MXFP6 appears to be more robust in such settings. |
Make sure to read the contributing guidelines before submitting a PR
test-quantize-*passed in local CI.test-tokenizer-ggml-vocabsreports failure but I don't think it's caused by this PR: (as this pr does not change gguf parser)