Skip to content

Support gpt-oss mxfp4 format qat #3547

@shelterwff-byte

Description

@shelterwff-byte

🔖 Feature description

Support MXFP4 (Microscaling) Format for QAT and Post-Training Quantization via torchao/Model-Optimizer.

Currently, Axolotl users attempting to use 4-bit floating-point formats may run into hardware-specific constraints (e.g., the nvfp4 error which is exclusive to Blackwell sm100). This feature request proposes adding support for MXFP4 (E2M1), a hardware-agnostic OCP standard that is supported on NVIDIA Hopper (H100/H800) and can be emulated efficiently on Ampere.

Implementing MXFP4 QAT will allow:

  1. Higher training stability compared to INT4/FP4.
  2. Better post-training weight compression for LLMs like gpt-oss.
  3. Alignment with NVIDIA's model-optimizer and torchao roadmaps.

✔️ Solution

Integrate torchao.quantization.quantize_ with MXFP4 specific configs or utilize NVIDIA's modelopt (Model Optimizer) workflow within Axolotl's quantization CLI.

Key components:

  • Add mxfp4 as a valid option for quantization.weight_dtype in the YAML config.
  • Implement the MXFP4 fake-quantization logic in axolotl.utils.quantization during the QAT phase.
  • Ensure compatibility with torchao's MX format implementations (specifically mx_fp4).

References:

❓ Alternatives

Currently, users are forced to use int4_weight_only or fp8, which either lacks the dynamic range of MXFP4 or doesn't provide the same 4-bit memory savings.

📝 Additional Context

As LLMs like gpt-oss (120B+) grow, 4-bit quantization becomes critical for inference. MXFP4 provides a sweet spot between 8-bit accuracy and 4-bit efficiency by using shared scales across groups of elements (e.g., block size 16 or 32).

axolotl-ai-cloud/axolotl#3333


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions