[model][example] feat: Add GLM-4.7 and GLM-4.7-Flash support#2983
Conversation
GLM-4.7 (Glm4MoeForCausalLM) uses the same architecture as GLM-4.5, so the existing GLM45Bridge handles it out of the box. GLM-4.7-Flash (Glm4MoeLiteForCausalLM) is a new architecture combining DeepSeek V3-style Multi-Latent Attention (MLA) with GLM MoE routing. Add GLM47FlashBridge with MLAModelProvider, reusing the DeepSeek common MLA weight mappings since the safetensors use identical parameter naming. Verified on 8-GPU node: - GLM-4.7-Flash roundtrip (EP=8): 9,701 weights matched, 0 failures - GLM-4.7 bridge resolution: correctly maps to GLM45Bridge Also adds THUDM and zai-org to SAFE_REPOS. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test 3f6237f |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> # Conflicts: # 3rdparty/Megatron-LM # src/megatron/bridge/models/__init__.py
Document HF IDs, architecture, hardware requirements, and slurm workflow for the GLM-4.7 and GLM-4.7-Flash example scripts. Aligns with the README structure used by the glm5 / minimax_m2 / bailing example folders. Verified GLM-4.7-Flash inference (TP=1, EP=8, single node on cw H100s): generation is coherent on the "What is artificial intelligence?" prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
| @@ -0,0 +1,121 @@ | |||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
Nit: Copyright year should be 2026 per project guidelines (CLAUDE.md: "Use the current year (2026) in generated content").
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Current PR head 7276a05 has the 2026 copyright header in src/megatron/bridge/models/glm/glm47_flash_bridge.py; confirmed during this cleanup.
|
|
||
| first_k = getattr(hf_config, "first_k_dense_replace", 1) | ||
| provider.moe_layer_freq = [0] * first_k + [1] * (hf_config.num_hidden_layers - first_k) | ||
|
|
||
| return provider | ||
|
|
||
| def build_conversion_tasks(self, hf_pretrained, megatron_model): | ||
| """Override to store config before mapping_registry is called.""" |
There was a problem hiding this comment.
Bug: The build_conversion_tasks override to stash self._hf_config is a deprecated pattern (see skills/adding-model-support/SKILL.md). The base class already sets self.hf_config automatically in build_conversion_tasks (line 1538 of model_bridge.py) before calling mapping_registry(), so this override is unnecessary.
mapping_registry() below reads getattr(self, "_hf_config", None) but should use self.hf_config instead. As-is, if the base class flow ever changes or if mapping_registry() is called without going through build_conversion_tasks first, the private _hf_config won't be set.
Suggested fix — delete the build_conversion_tasks override entirely, and in mapping_registry() change to:
def mapping_registry(self) -> MegatronMappingRegistry:
hf_config = getattr(self, "hf_config", None)
...This matches the pattern used by GLM45Bridge.mapping_registry() (which reads self.hf_config directly at line 196 of glm45_bridge.py).
There was a problem hiding this comment.
Current PR head 7276a05 uses self.hf_config in mapping_registry() and has no build_conversion_tasks override; confirmed during this cleanup.
|
Review of GLM-4.7 / GLM-4.7-Flash Support - Issues Found: (1) Bug in glm47_flash_bridge.py lines 101-108 - The build_conversion_tasks override to stash self._hf_config is deprecated. The base class sets self.hf_config automatically before mapping_registry() is called. Remove the override and use self.hf_config in mapping_registry(). See inline comment. (2) Copyright year in glm47_flash_bridge.py line 1 should be 2026 not 2025. (3) Missing unit tests for GLM47FlashBridge - see test_glm45_bridge.py for the pattern. Suggested test cases: No perf tests impacted. |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 2df92af |
|
/ok to test 2df92af |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
Fixed the GLM-4.7-Flash unit failure in Root cause: the new test collected every Fix: flatten dict-valued Validation on Result: 3 passed, 33 warnings in 1.23s. |
|
/ok to test d210be5 |
| "nvidia", | ||
| "openai", | ||
| "Qwen", | ||
| "THUDM", |
There was a problem hiding this comment.
z.ai old name I think , but can remove
There was a problem hiding this comment.
Addressed in 7276a05: removed the stale THUDM safe-repo alias and kept zai-org.
There was a problem hiding this comment.
does this mean there is no glm45 (non flash) bridge..?
There was a problem hiding this comment.
Addressed in 7276a05: added a module docstring note that this file only registers glm4_moe_lite / Flash, while full GLM-4.7 uses the existing GLM45Bridge / glm4_moe path.
yaoyu-33
left a comment
There was a problem hiding this comment.
Review-only pass focused on the GLM-4.7 example scripts. The bridge file looks consistent with the earlier review fixes; the main remaining issue is that the example scripts are still acting like a configurable validation harness. For this PR, I recommend simplifying to one known-good path per script and letting users edit the command when they need different parallelism.
| examples/conversion/hf_megatron_roundtrip_multi_gpu.py \ | ||
| --hf-model-id "$GLM47_FLASH_HF" --ep 8 | ||
|
|
||
| # Multi-GPU round-trip with TP=2 EP=4 |
There was a problem hiding this comment.
This script is meant to be a short example, so I would avoid running multiple parallelism layouts by default. Please keep one known-good Flash round-trip path here, e.g. --nproc_per_node=8 ... --tp 1 --pp 1 --ep 8, and drop the second TP=2 EP=4 run unless there is a specific validation requirement for this PR. Users can edit the command if they want to try alternate TP/EP layouts.
There was a problem hiding this comment.
Current PR head 7276a05 keeps one Flash round-trip path with nproc_per_node=8, tp=1, pp=1, ep=8, and no TP=2 / EP=4 run.
| GLM47_FLASH_HF="${GLM47_FLASH_HF:-zai-org/GLM-4.7-Flash}" | ||
| PROMPT="${PROMPT:-What is artificial intelligence?}" | ||
| MAX_NEW_TOKENS="${MAX_NEW_TOKENS:-100}" | ||
| TP="${TP:-1}" |
There was a problem hiding this comment.
For this example, please inline the fixed parallelism values instead of exposing TP, PP, EP, and NPROC_PER_NODE as bash variables. Keeping PROMPT / MAX_NEW_TOKENS configurable is reasonable, but the intended validation path should be obvious in the command, e.g. --nproc_per_node=8 and --tp 1 --pp 1 --ep 8.
There was a problem hiding this comment.
Current PR head 7276a05 inlines nproc_per_node=8 with tp=1, pp=1, ep=8; only the prompt and token count remain configurable.
| MODEL_NAME="${MODEL_NAME:-GLM-4.7}" | ||
| HF_MODEL_ID="${HF_MODEL_ID:-zai-org/$MODEL_NAME}" | ||
|
|
||
| # Parallelism configs: "TP,PP,EP" per entry. |
There was a problem hiding this comment.
This sweep makes the Slurm example much more complex than it needs to be. Please remove PARALLELISM_CONFIGS_STR and the loop below, and run a single known-good GLM-4.7 conversion path directly, e.g. --tp 1 --pp 1 --ep 32. The README can mention that other layouts are possible, but this script should show the one recommended path.
There was a problem hiding this comment.
Current PR head 7276a05 removes the parallelism sweep and runs one Slurm conversion path with tp=1, pp=1, ep=32.
| HF_MODEL_ID="${HF_MODEL_ID:-zai-org/$MODEL_NAME}" | ||
| PROMPT="${PROMPT:-What is artificial intelligence?}" | ||
| MAX_NEW_TOKENS="${MAX_NEW_TOKENS:-100}" | ||
| TP="${TP:-1}" |
There was a problem hiding this comment.
Same simplification request here: please do not expose fixed parallelism as env vars in the example. Inline --tp 1 --pp 1 --ep 32 in the command and simplify the status echo accordingly. Environment-specific vars like CONTAINER_IMAGE, CONTAINER_MOUNTS, WORKDIR, HF_HOME, and optional prompt settings are fine to keep.
There was a problem hiding this comment.
Current PR head 7276a05 inlines tp=1, pp=1, ep=32; the remaining variables are environment and prompt settings.
|
|
||
| ### GLM-4.7 (multi-node via Slurm) | ||
|
|
||
| [slurm_conversion.sh](slurm_conversion.sh) sweeps multiple parallelism configs (`TP,PP,EP`) to verify round-trip conversion. Each config runs sequentially. |
There was a problem hiding this comment.
Please update the README to match the script simplification: avoid documenting a default TP/PP/EP sweep or parallelism env vars as part of the public example flow. The docs should describe the fixed recommended paths (EP=8 for Flash, EP=32 for full GLM-4.7) and leave custom parallelism as something users can edit in the command.
There was a problem hiding this comment.
Current PR head 7276a05 documents the fixed recommended paths, EP=8 for Flash and EP=32 for full GLM-4.7, with custom layouts left as direct script edits.
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
| | Variable | Description | | ||
| |---|---| | ||
| | `CONTAINER_IMAGE` | Path to Singularity / SquashFS container image | | ||
| | `CONTAINER_MOUNTS` | Bind mounts (must include the Bridge checkout as `/opt/Megatron-Bridge`, or set `WORKDIR` to the mounted path) | |
There was a problem hiding this comment.
this is not needed if people are using correct nemo container, we mount for debugging
There was a problem hiding this comment.
Addressed in 7276a05: rephrased CONTAINER_MOUNTS as optional for data, caches, or a local checkout when debugging, and documented WORKDIR separately.
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 7276a05 |
…NeMo#2983) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Summary
GLM47FlashBridgeforGlm4MoeLiteForCausalLM(GLM-4.7-Flash), a new architecture combining DeepSeek V3-style Multi-Latent Attention (MLA) with GLM MoE routingGlm4MoeForCausalLM) is already supported by the existingGLM45Bridge— no code changes neededTHUDMandzai-orgtoSAFE_REPOSDetails
GLM-4.7-Flash (
zai-org/GLM-4.7-Flash):MLAModelProviderwith DeepSeek common weight mappings (safetensors use identical MLA parameter naming)e_score_correction_biasGLM-4.7 (
zai-org/GLM-4.7):glm4_moearchitecture as GLM-4.5GLM45BridgeTest plan
GLM45Bridgewithnum_layers=92, hidden=5120, experts=160Made with Cursor