Add GLM-5.1 support in NeMo RL
Background
NeMo RL recently added GRPO support for new Qwen3.5 dense/MoE models and GLM-4.7-Flash, including recipes such as grpo-qwen3.5-9b-1n8g-megatron.yaml, grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml, and grpo-glm47-flash-4n8g-automodel.yaml. See prior work in #2151.
GLM-5.1 is Z.AI’s newer open-weight flagship MoE model, available as zai-org/GLM-5.1. It uses the GlmMoeDsaForCausalLM / glm_moe_dsa architecture and is documented in NeMo AutoModel as part of the GLM-5 / GLM-5.1 model family.
Goal
Add first-class GLM-5.1 support in NeMo RL so users can run GRPO training with the existing RL stack, following the integration pattern used for Qwen3.5 and GLM-4.7-Flash.
Scope
- Add a GLM-5.1 GRPO recipe, through the Megatron core path
- Validate tokenizer, chat template, generation config, and model loading for
zai-org/GLM-5.1.
- Confirm rollout compatibility with the supported inference backend.
- Run model diagnostics for train/inference logprob consistency.
- Document any required dependency versions, known limitations, or unsupported modes.
- Add/update smoke or recipe tests where feasible.
Acceptance Criteria
- A runnable GLM-5.1 GRPO config is added under
examples/configs/.
- The recipe can initialize policy, reference policy, reward flow, and rollout workers successfully.
- Logprob consistency checks pass within the NeMo RL model integration threshold, or any deviations are documented with mitigation.
- Short GRPO smoke run completes without model-loading, tokenizer, or rollout errors.
- Documentation/model support notes are updated with GLM-5.1 status and usage instructions.
References
Add GLM-5.1 support in NeMo RL
Background
NeMo RL recently added GRPO support for new Qwen3.5 dense/MoE models and GLM-4.7-Flash, including recipes such as
grpo-qwen3.5-9b-1n8g-megatron.yaml,grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml, andgrpo-glm47-flash-4n8g-automodel.yaml. See prior work in #2151.GLM-5.1 is Z.AI’s newer open-weight flagship MoE model, available as
zai-org/GLM-5.1. It uses theGlmMoeDsaForCausalLM/glm_moe_dsaarchitecture and is documented in NeMo AutoModel as part of the GLM-5 / GLM-5.1 model family.Goal
Add first-class GLM-5.1 support in NeMo RL so users can run GRPO training with the existing RL stack, following the integration pattern used for Qwen3.5 and GLM-4.7-Flash.
Scope
zai-org/GLM-5.1.Acceptance Criteria
examples/configs/.References