Support GRPO training for MiniMax-m2. @jQizhang @zpqiu Branch: https://github.com/NVIDIA-NeMo/RL/tree/minimax-m2 Configuration examples and accuracy validation curves: [8K sequence](https://github.com/NVIDIA-NeMo/RL/blob/minimax-m2/examples/configs/recipes/llm/grpo-minimax-m2-dapo-8n8g-automodel-ep64.yaml): <img width="791" height="311" alt="Image" src="https://github.com/user-attachments/assets/7d925f7a-19c6-49cc-a6df-128d61bd9013" /> [16K sequence](https://github.com/NVIDIA-NeMo/RL/blob/minimax-m2/examples/configs/recipes/llm/grpo-minimax-m2-dapo-16k-8n8g-automodel-ep64cp8.yaml) <img width="790" height="319" alt="Image" src="https://github.com/user-attachments/assets/2f1b2f88-4fd4-4b74-9157-31c5871fbbc4" />
Support GRPO training for MiniMax-m2. @jQizhang @zpqiu
Branch: https://github.com/NVIDIA-NeMo/RL/tree/minimax-m2
Configuration examples and accuracy validation curves:
8K sequence:
16K sequence