[model][example] feat: Add GLM-4.7 and GLM-4.7-Flash support by yaoyu-33 · Pull Request #2983 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-03-25T16:17:04Z

Summary

Add GLM47FlashBridge for Glm4MoeLiteForCausalLM (GLM-4.7-Flash), a new architecture combining DeepSeek V3-style Multi-Latent Attention (MLA) with GLM MoE routing
GLM-4.7 (Glm4MoeForCausalLM) is already supported by the existing GLM45Bridge — no code changes needed
Add THUDM and zai-org to SAFE_REPOS
Add example scripts for both models (conversion, inference, Slurm multi-node)

Details

GLM-4.7-Flash (zai-org/GLM-4.7-Flash):

~30B params, 64 routed experts (top-4), MLA attention, MTP
Uses MLAModelProvider with DeepSeek common weight mappings (safetensors use identical MLA parameter naming)
Sigmoid routing with e_score_correction_bias

GLM-4.7 (zai-org/GLM-4.7):

~358B params, 160 routed experts (top-8), same glm4_moe architecture as GLM-4.5
Works out of the box with GLM45Bridge

Test plan

GLM-4.7-Flash roundtrip (EP=8, 8 GPUs): 9,701 weights matched, 0 failures
GLM-4.7 bridge resolution: correctly maps to GLM45Bridge with num_layers=92, hidden=5120, experts=160
CI unit tests
CI functional tests

Made with Cursor

GLM-4.7 (Glm4MoeForCausalLM) uses the same architecture as GLM-4.5, so the existing GLM45Bridge handles it out of the box. GLM-4.7-Flash (Glm4MoeLiteForCausalLM) is a new architecture combining DeepSeek V3-style Multi-Latent Attention (MLA) with GLM MoE routing. Add GLM47FlashBridge with MLAModelProvider, reusing the DeepSeek common MLA weight mappings since the safetensors use identical parameter naming. Verified on 8-GPU node: - GLM-4.7-Flash roundtrip (EP=8): 9,701 weights matched, 0 failures - GLM-4.7 bridge resolution: correctly maps to GLM45Bridge Also adds THUDM and zai-org to SAFE_REPOS. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

copy-pr-bot · 2026-03-25T16:17:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-03-25T16:17:10Z

/ok to test 3f6237f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> # Conflicts: # 3rdparty/Megatron-LM # src/megatron/bridge/models/__init__.py

Document HF IDs, architecture, hardware requirements, and slurm workflow for the GLM-4.7 and GLM-4.7-Flash example scripts. Aligns with the README structure used by the glm5 / minimax_m2 / bailing example folders. Verified GLM-4.7-Flash inference (TP=1, EP=8, single node on cw H100s): generation is coherent on the "What is artificial intelligence?" prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

claude · 2026-05-14T23:26:44Z

@@ -0,0 +1,121 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.


Nit: Copyright year should be 2026 per project guidelines (CLAUDE.md: "Use the current year (2026) in generated content").

Suggested change

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Current PR head 7276a05 has the 2026 copyright header in src/megatron/bridge/models/glm/glm47_flash_bridge.py; confirmed during this cleanup.

claude · 2026-05-14T23:26:50Z

+
+        first_k = getattr(hf_config, "first_k_dense_replace", 1)
+        provider.moe_layer_freq = [0] * first_k + [1] * (hf_config.num_hidden_layers - first_k)
+
+        return provider
+
+    def build_conversion_tasks(self, hf_pretrained, megatron_model):
+        """Override to store config before mapping_registry is called."""


Bug: The build_conversion_tasks override to stash self._hf_config is a deprecated pattern (see skills/adding-model-support/SKILL.md). The base class already sets self.hf_config automatically in build_conversion_tasks (line 1538 of model_bridge.py) before calling mapping_registry(), so this override is unnecessary.

mapping_registry() below reads getattr(self, "_hf_config", None) but should use self.hf_config instead. As-is, if the base class flow ever changes or if mapping_registry() is called without going through build_conversion_tasks first, the private _hf_config won't be set.

Suggested fix — delete the build_conversion_tasks override entirely, and in mapping_registry() change to:

def mapping_registry(self) -> MegatronMappingRegistry: hf_config = getattr(self, "hf_config", None) ...

This matches the pattern used by GLM45Bridge.mapping_registry() (which reads self.hf_config directly at line 196 of glm45_bridge.py).

Current PR head 7276a05 uses self.hf_config in mapping_registry() and has no build_conversion_tasks override; confirmed during this cleanup.

claude · 2026-05-14T23:32:58Z

Review of GLM-4.7 / GLM-4.7-Flash Support - Issues Found: (1) Bug in glm47_flash_bridge.py lines 101-108 - The build_conversion_tasks override to stash self._hf_config is deprecated. The base class sets self.hf_config automatically before mapping_registry() is called. Remove the override and use self.hf_config in mapping_registry(). See inline comment. (2) Copyright year in glm47_flash_bridge.py line 1 should be 2026 not 2025. (3) Missing unit tests for GLM47FlashBridge - see test_glm45_bridge.py for the pattern. Suggested test cases: No perf tests impacted.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-14T23:37:20Z

/ok to test 2df92af

yaoyu-33 · 2026-05-15T00:20:28Z

/ok to test 2df92af

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-15T20:54:01Z

Fixed the GLM-4.7-Flash unit failure in d210be578592aa26282db8783dad39d6342685d2.

Root cause: the new test collected every mapping.hf_param into a set, but composite mappings such as GatedMLPMapping expose hf_param as a dict of HF component params (gate, up). The registry behavior is valid; the test was assuming all HF params were hashable strings.

Fix: flatten dict-valued hf_param entries into their values before checking the expected HF mapping names.

Validation on cw at pushed HEAD d210be57:
uv run --active --no-sync python -m pytest tests/unit_tests/models/glm/test_glm47_flash_bridge.py -v

Result: 3 passed, 33 warnings in 1.23s.
Log: /lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_llm/users/yuya/Megatron-Bridge-cc2/logs/pr2983-glm47-flash-pytest-20260515.log

yaoyu-33 · 2026-05-15T20:54:06Z

/ok to test d210be5

cuichenx · 2026-05-15T22:52:19Z

    "nvidia",
    "openai",
    "Qwen",
+    "THUDM",


what is thudm?

z.ai old name I think , but can remove

Addressed in 7276a05: removed the stale THUDM safe-repo alias and kept zai-org.

cuichenx · 2026-05-15T22:52:53Z

does this mean there is no glm45 (non flash) bridge..?

Addressed in 7276a05: added a module docstring note that this file only registers glm4_moe_lite / Flash, while full GLM-4.7 uses the existing GLM45Bridge / glm4_moe path.

yaoyu-33

Review-only pass focused on the GLM-4.7 example scripts. The bridge file looks consistent with the earlier review fixes; the main remaining issue is that the example scripts are still acting like a configurable validation harness. For this PR, I recommend simplifying to one known-good path per script and letting users edit the command when they need different parallelism.

yaoyu-33 · 2026-05-15T22:57:21Z

+    examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
+    --hf-model-id "$GLM47_FLASH_HF" --ep 8
+
+# Multi-GPU round-trip with TP=2 EP=4


This script is meant to be a short example, so I would avoid running multiple parallelism layouts by default. Please keep one known-good Flash round-trip path here, e.g. --nproc_per_node=8 ... --tp 1 --pp 1 --ep 8, and drop the second TP=2 EP=4 run unless there is a specific validation requirement for this PR. Users can edit the command if they want to try alternate TP/EP layouts.

Current PR head 7276a05 keeps one Flash round-trip path with nproc_per_node=8, tp=1, pp=1, ep=8, and no TP=2 / EP=4 run.

yaoyu-33 · 2026-05-15T22:57:21Z

+GLM47_FLASH_HF="${GLM47_FLASH_HF:-zai-org/GLM-4.7-Flash}"
+PROMPT="${PROMPT:-What is artificial intelligence?}"
+MAX_NEW_TOKENS="${MAX_NEW_TOKENS:-100}"
+TP="${TP:-1}"


For this example, please inline the fixed parallelism values instead of exposing TP, PP, EP, and NPROC_PER_NODE as bash variables. Keeping PROMPT / MAX_NEW_TOKENS configurable is reasonable, but the intended validation path should be obvious in the command, e.g. --nproc_per_node=8 and --tp 1 --pp 1 --ep 8.

Current PR head 7276a05 inlines nproc_per_node=8 with tp=1, pp=1, ep=8; only the prompt and token count remain configurable.

yaoyu-33 · 2026-05-15T22:57:21Z

+MODEL_NAME="${MODEL_NAME:-GLM-4.7}"
+HF_MODEL_ID="${HF_MODEL_ID:-zai-org/$MODEL_NAME}"
+
+# Parallelism configs: "TP,PP,EP" per entry.


This sweep makes the Slurm example much more complex than it needs to be. Please remove PARALLELISM_CONFIGS_STR and the loop below, and run a single known-good GLM-4.7 conversion path directly, e.g. --tp 1 --pp 1 --ep 32. The README can mention that other layouts are possible, but this script should show the one recommended path.

Current PR head 7276a05 removes the parallelism sweep and runs one Slurm conversion path with tp=1, pp=1, ep=32.

yaoyu-33 · 2026-05-15T22:57:21Z

+HF_MODEL_ID="${HF_MODEL_ID:-zai-org/$MODEL_NAME}"
+PROMPT="${PROMPT:-What is artificial intelligence?}"
+MAX_NEW_TOKENS="${MAX_NEW_TOKENS:-100}"
+TP="${TP:-1}"


Same simplification request here: please do not expose fixed parallelism as env vars in the example. Inline --tp 1 --pp 1 --ep 32 in the command and simplify the status echo accordingly. Environment-specific vars like CONTAINER_IMAGE, CONTAINER_MOUNTS, WORKDIR, HF_HOME, and optional prompt settings are fine to keep.

Current PR head 7276a05 inlines tp=1, pp=1, ep=32; the remaining variables are environment and prompt settings.

yaoyu-33 · 2026-05-15T22:57:21Z

+
+### GLM-4.7 (multi-node via Slurm)
+
+[slurm_conversion.sh](slurm_conversion.sh) sweeps multiple parallelism configs (`TP,PP,EP`) to verify round-trip conversion. Each config runs sequentially.


Please update the README to match the script simplification: avoid documenting a default TP/PP/EP sweep or parallelism env vars as part of the public example flow. The docs should describe the fixed recommended paths (EP=8 for Flash, EP=32 for full GLM-4.7) and leave custom parallelism as something users can edit in the command.

Current PR head 7276a05 documents the fixed recommended paths, EP=8 for Flash and EP=32 for full GLM-4.7, with custom layouts left as direct script edits.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-15T23:14:04Z

+| Variable | Description |
+|---|---|
+| `CONTAINER_IMAGE` | Path to Singularity / SquashFS container image |
+| `CONTAINER_MOUNTS` | Bind mounts (must include the Bridge checkout as `/opt/Megatron-Bridge`, or set `WORKDIR` to the mounted path) |


this is not needed if people are using correct nemo container, we mount for debugging

Addressed in 7276a05: rephrased CONTAINER_MOUNTS as optional for data, caches, or a local checkout when debugging, and documented WORKDIR separately.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-15T23:23:02Z

/ok to test 7276a05

…NeMo#2983) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>

copy-pr-bot Bot temporarily deployed to test March 25, 2026 16:18 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 25, 2026 16:38 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 25, 2026 16:50 Failure

copy-pr-bot Bot temporarily deployed to public March 25, 2026 16:53 Inactive

yaoyu-33 added area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work labels Mar 25, 2026

yaoyu-33 marked this pull request as draft April 6, 2026 23:12

cuichenx mentioned this pull request May 8, 2026

[NeMo FW 26.06 Release] MBridge v0.5.0 Roadmap #3754

Open

yaoyu-33 and others added 3 commits May 11, 2026 10:45

Merge remote-tracking branch 'origin/main' into yuya/add-glm47-support

256318b

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> # Conflicts: # 3rdparty/Megatron-LM # src/megatron/bridge/models/__init__.py

[model][docs] docs: finalize GLM-4.7 examples

b9d236b

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 marked this pull request as ready for review May 14, 2026 23:24

claude Bot reviewed May 14, 2026

View reviewed changes

Merge branch 'main' into yuya/add-glm47-support

33d7fdd

[model] test: add GLM-4.7 Flash bridge coverage

2df92af

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 14, 2026 23:38 Inactive

copy-pr-bot Bot temporarily deployed to test May 14, 2026 23:38 Inactive

copy-pr-bot Bot temporarily deployed to public May 14, 2026 23:45 Inactive

copy-pr-bot Bot temporarily deployed to public May 14, 2026 23:46 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 00:00 Inactive

yaoyu-33 added the needs-review PR is ready for code review and waiting on a reviewer label May 15, 2026

[model] test: handle GLM-4.7 Flash composite mappings

d210be5

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 15, 2026 20:54 Inactive

copy-pr-bot Bot temporarily deployed to test May 15, 2026 20:54 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 21:03 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 21:04 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 21:20 Inactive

cuichenx reviewed May 15, 2026

View reviewed changes

yaoyu-33 commented May 15, 2026

View reviewed changes

[model][example] fix: simplify GLM-4.7 examples

60018ce

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 commented May 15, 2026

View reviewed changes

[model][example] fix: address GLM-4.7 review comments

7276a05

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:23 Inactive

copy-pr-bot Bot temporarily deployed to test May 15, 2026 23:23 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:31 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:32 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:45 Inactive

cuichenx approved these changes May 18, 2026

View reviewed changes

yaoyu-33 merged commit 544955c into main May 18, 2026
97 checks passed

yaoyu-33 deleted the yuya/add-glm47-support branch May 18, 2026 22:01

		@@ -0,0 +1,121 @@
		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

	# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
	# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.


		### GLM-4.7 (multi-node via Slurm)

		[slurm_conversion.sh](slurm_conversion.sh) sweeps multiple parallelism configs (`TP,PP,EP`) to verify round-trip conversion. Each config runs sequentially.

Conversation

yaoyu-33 commented Mar 25, 2026

Summary

Details

Test plan

Uh oh!

copy-pr-bot Bot commented Mar 25, 2026

Uh oh!

yaoyu-33 commented Mar 25, 2026

Uh oh!

claude Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 14, 2026

Uh oh!

yaoyu-33 commented May 14, 2026

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

yaoyu-33 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaoyu-33 commented May 15, 2026 •

edited

Loading