Skip to content

Add GLM-5.1 support in NeMo RL #2377

@anwithk

Description

@anwithk

Add GLM-5.1 support in NeMo RL

Background

NeMo RL recently added GRPO support for new Qwen3.5 dense/MoE models and GLM-4.7-Flash, including recipes such as grpo-qwen3.5-9b-1n8g-megatron.yaml, grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml, and grpo-glm47-flash-4n8g-automodel.yaml. See prior work in #2151.

GLM-5.1 is Z.AI’s newer open-weight flagship MoE model, available as zai-org/GLM-5.1. It uses the GlmMoeDsaForCausalLM / glm_moe_dsa architecture and is documented in NeMo AutoModel as part of the GLM-5 / GLM-5.1 model family.

Goal

Add first-class GLM-5.1 support in NeMo RL so users can run GRPO training with the existing RL stack, following the integration pattern used for Qwen3.5 and GLM-4.7-Flash.

Scope

  • Add a GLM-5.1 GRPO recipe, through the Megatron core path
  • Validate tokenizer, chat template, generation config, and model loading for zai-org/GLM-5.1.
  • Confirm rollout compatibility with the supported inference backend.
  • Run model diagnostics for train/inference logprob consistency.
  • Document any required dependency versions, known limitations, or unsupported modes.
  • Add/update smoke or recipe tests where feasible.

Acceptance Criteria

  • A runnable GLM-5.1 GRPO config is added under examples/configs/.
  • The recipe can initialize policy, reference policy, reward flow, and rollout workers successfully.
  • Logprob consistency checks pass within the NeMo RL model integration threshold, or any deviations are documented with mitigation.
  • Short GRPO smoke run completes without model-loading, tokenizer, or rollout errors.
  • Documentation/model support notes are updated with GLM-5.1 status and usage instructions.

References

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions