Skip to content

[model][example] feat: Add GLM-4.7 and GLM-4.7-Flash support#2983

Merged
yaoyu-33 merged 9 commits into
mainfrom
yuya/add-glm47-support
May 18, 2026
Merged

[model][example] feat: Add GLM-4.7 and GLM-4.7-Flash support#2983
yaoyu-33 merged 9 commits into
mainfrom
yuya/add-glm47-support

Conversation

@yaoyu-33

Copy link
Copy Markdown
Contributor

Summary

  • Add GLM47FlashBridge for Glm4MoeLiteForCausalLM (GLM-4.7-Flash), a new architecture combining DeepSeek V3-style Multi-Latent Attention (MLA) with GLM MoE routing
  • GLM-4.7 (Glm4MoeForCausalLM) is already supported by the existing GLM45Bridge — no code changes needed
  • Add THUDM and zai-org to SAFE_REPOS
  • Add example scripts for both models (conversion, inference, Slurm multi-node)

Details

GLM-4.7-Flash (zai-org/GLM-4.7-Flash):

  • ~30B params, 64 routed experts (top-4), MLA attention, MTP
  • Uses MLAModelProvider with DeepSeek common weight mappings (safetensors use identical MLA parameter naming)
  • Sigmoid routing with e_score_correction_bias

GLM-4.7 (zai-org/GLM-4.7):

  • ~358B params, 160 routed experts (top-8), same glm4_moe architecture as GLM-4.5
  • Works out of the box with GLM45Bridge

Test plan

  • GLM-4.7-Flash roundtrip (EP=8, 8 GPUs): 9,701 weights matched, 0 failures
  • GLM-4.7 bridge resolution: correctly maps to GLM45Bridge with num_layers=92, hidden=5120, experts=160
  • CI unit tests
  • CI functional tests

Made with Cursor

GLM-4.7 (Glm4MoeForCausalLM) uses the same architecture as GLM-4.5,
so the existing GLM45Bridge handles it out of the box.

GLM-4.7-Flash (Glm4MoeLiteForCausalLM) is a new architecture combining
DeepSeek V3-style Multi-Latent Attention (MLA) with GLM MoE routing.
Add GLM47FlashBridge with MLAModelProvider, reusing the DeepSeek common
MLA weight mappings since the safetensors use identical parameter naming.

Verified on 8-GPU node:
- GLM-4.7-Flash roundtrip (EP=8): 9,701 weights matched, 0 failures
- GLM-4.7 bridge resolution: correctly maps to GLM45Bridge

Also adds THUDM and zai-org to SAFE_REPOS.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@copy-pr-bot

copy-pr-bot Bot commented Mar 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 3f6237f

@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work labels Mar 25, 2026
@yaoyu-33 yaoyu-33 marked this pull request as draft April 6, 2026 23:12
yaoyu-33 and others added 3 commits May 11, 2026 10:45
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

# Conflicts:
#	3rdparty/Megatron-LM
#	src/megatron/bridge/models/__init__.py
Document HF IDs, architecture, hardware requirements, and slurm workflow
for the GLM-4.7 and GLM-4.7-Flash example scripts. Aligns with the
README structure used by the glm5 / minimax_m2 / bailing example folders.

Verified GLM-4.7-Flash inference (TP=1, EP=8, single node on cw H100s):
generation is coherent on the "What is artificial intelligence?" prompt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33 yaoyu-33 marked this pull request as ready for review May 14, 2026 23:24
@@ -0,0 +1,121 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Copyright year should be 2026 per project guidelines (CLAUDE.md: "Use the current year (2026) in generated content").

Suggested change
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR head 7276a05 has the 2026 copyright header in src/megatron/bridge/models/glm/glm47_flash_bridge.py; confirmed during this cleanup.

Comment on lines +101 to +108

first_k = getattr(hf_config, "first_k_dense_replace", 1)
provider.moe_layer_freq = [0] * first_k + [1] * (hf_config.num_hidden_layers - first_k)

return provider

def build_conversion_tasks(self, hf_pretrained, megatron_model):
"""Override to store config before mapping_registry is called."""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The build_conversion_tasks override to stash self._hf_config is a deprecated pattern (see skills/adding-model-support/SKILL.md). The base class already sets self.hf_config automatically in build_conversion_tasks (line 1538 of model_bridge.py) before calling mapping_registry(), so this override is unnecessary.

mapping_registry() below reads getattr(self, "_hf_config", None) but should use self.hf_config instead. As-is, if the base class flow ever changes or if mapping_registry() is called without going through build_conversion_tasks first, the private _hf_config won't be set.

Suggested fix — delete the build_conversion_tasks override entirely, and in mapping_registry() change to:

def mapping_registry(self) -> MegatronMappingRegistry:
    hf_config = getattr(self, "hf_config", None)
    ...

This matches the pattern used by GLM45Bridge.mapping_registry() (which reads self.hf_config directly at line 196 of glm45_bridge.py).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR head 7276a05 uses self.hf_config in mapping_registry() and has no build_conversion_tasks override; confirmed during this cleanup.

@claude

claude Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Review of GLM-4.7 / GLM-4.7-Flash Support - Issues Found: (1) Bug in glm47_flash_bridge.py lines 101-108 - The build_conversion_tasks override to stash self._hf_config is deprecated. The base class sets self.hf_config automatically before mapping_registry() is called. Remove the override and use self.hf_config in mapping_registry(). See inline comment. (2) Copyright year in glm47_flash_bridge.py line 1 should be 2026 not 2025. (3) Missing unit tests for GLM47FlashBridge - see test_glm45_bridge.py for the pattern. Suggested test cases: No perf tests impacted.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 2df92af

@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 2df92af

@yaoyu-33 yaoyu-33 added the needs-review PR is ready for code review and waiting on a reviewer label May 15, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

yaoyu-33 commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

Fixed the GLM-4.7-Flash unit failure in d210be578592aa26282db8783dad39d6342685d2.

Root cause: the new test collected every mapping.hf_param into a set, but composite mappings such as GatedMLPMapping expose hf_param as a dict of HF component params (gate, up). The registry behavior is valid; the test was assuming all HF params were hashable strings.

Fix: flatten dict-valued hf_param entries into their values before checking the expected HF mapping names.

Validation on cw at pushed HEAD d210be57:
uv run --active --no-sync python -m pytest tests/unit_tests/models/glm/test_glm47_flash_bridge.py -v

Result: 3 passed, 33 warnings in 1.23s.
Log: /lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_llm/users/yuya/Megatron-Bridge-cc2/logs/pr2983-glm47-flash-pytest-20260515.log

@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test d210be5

"nvidia",
"openai",
"Qwen",
"THUDM",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is thudm?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

z.ai old name I think , but can remove

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7276a05: removed the stale THUDM safe-repo alias and kept zai-org.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean there is no glm45 (non flash) bridge..?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7276a05: added a module docstring note that this file only registers glm4_moe_lite / Flash, while full GLM-4.7 uses the existing GLM45Bridge / glm4_moe path.

@yaoyu-33 yaoyu-33 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review-only pass focused on the GLM-4.7 example scripts. The bridge file looks consistent with the earlier review fixes; the main remaining issue is that the example scripts are still acting like a configurable validation harness. For this PR, I recommend simplifying to one known-good path per script and letting users edit the command when they need different parallelism.

Comment thread examples/models/glm47/conversion.sh Outdated
examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
--hf-model-id "$GLM47_FLASH_HF" --ep 8

# Multi-GPU round-trip with TP=2 EP=4

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is meant to be a short example, so I would avoid running multiple parallelism layouts by default. Please keep one known-good Flash round-trip path here, e.g. --nproc_per_node=8 ... --tp 1 --pp 1 --ep 8, and drop the second TP=2 EP=4 run unless there is a specific validation requirement for this PR. Users can edit the command if they want to try alternate TP/EP layouts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR head 7276a05 keeps one Flash round-trip path with nproc_per_node=8, tp=1, pp=1, ep=8, and no TP=2 / EP=4 run.

Comment thread examples/models/glm47/inference.sh Outdated
GLM47_FLASH_HF="${GLM47_FLASH_HF:-zai-org/GLM-4.7-Flash}"
PROMPT="${PROMPT:-What is artificial intelligence?}"
MAX_NEW_TOKENS="${MAX_NEW_TOKENS:-100}"
TP="${TP:-1}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this example, please inline the fixed parallelism values instead of exposing TP, PP, EP, and NPROC_PER_NODE as bash variables. Keeping PROMPT / MAX_NEW_TOKENS configurable is reasonable, but the intended validation path should be obvious in the command, e.g. --nproc_per_node=8 and --tp 1 --pp 1 --ep 8.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR head 7276a05 inlines nproc_per_node=8 with tp=1, pp=1, ep=8; only the prompt and token count remain configurable.

MODEL_NAME="${MODEL_NAME:-GLM-4.7}"
HF_MODEL_ID="${HF_MODEL_ID:-zai-org/$MODEL_NAME}"

# Parallelism configs: "TP,PP,EP" per entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sweep makes the Slurm example much more complex than it needs to be. Please remove PARALLELISM_CONFIGS_STR and the loop below, and run a single known-good GLM-4.7 conversion path directly, e.g. --tp 1 --pp 1 --ep 32. The README can mention that other layouts are possible, but this script should show the one recommended path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR head 7276a05 removes the parallelism sweep and runs one Slurm conversion path with tp=1, pp=1, ep=32.

HF_MODEL_ID="${HF_MODEL_ID:-zai-org/$MODEL_NAME}"
PROMPT="${PROMPT:-What is artificial intelligence?}"
MAX_NEW_TOKENS="${MAX_NEW_TOKENS:-100}"
TP="${TP:-1}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same simplification request here: please do not expose fixed parallelism as env vars in the example. Inline --tp 1 --pp 1 --ep 32 in the command and simplify the status echo accordingly. Environment-specific vars like CONTAINER_IMAGE, CONTAINER_MOUNTS, WORKDIR, HF_HOME, and optional prompt settings are fine to keep.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR head 7276a05 inlines tp=1, pp=1, ep=32; the remaining variables are environment and prompt settings.

Comment thread examples/models/glm47/README.md Outdated

### GLM-4.7 (multi-node via Slurm)

[slurm_conversion.sh](slurm_conversion.sh) sweeps multiple parallelism configs (`TP,PP,EP`) to verify round-trip conversion. Each config runs sequentially.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the README to match the script simplification: avoid documenting a default TP/PP/EP sweep or parallelism env vars as part of the public example flow. The docs should describe the fixed recommended paths (EP=8 for Flash, EP=32 for full GLM-4.7) and leave custom parallelism as something users can edit in the command.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current PR head 7276a05 documents the fixed recommended paths, EP=8 for Flash and EP=32 for full GLM-4.7, with custom layouts left as direct script edits.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Comment thread examples/models/glm47/README.md Outdated
| Variable | Description |
|---|---|
| `CONTAINER_IMAGE` | Path to Singularity / SquashFS container image |
| `CONTAINER_MOUNTS` | Bind mounts (must include the Bridge checkout as `/opt/Megatron-Bridge`, or set `WORKDIR` to the mounted path) |

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed if people are using correct nemo container, we mount for debugging

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7276a05: rephrased CONTAINER_MOUNTS as optional for data, caches, or a local checkout when debugging, and documented WORKDIR separately.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 7276a05

@yaoyu-33 yaoyu-33 merged commit 544955c into main May 18, 2026
97 checks passed
@yaoyu-33 yaoyu-33 deleted the yuya/add-glm47-support branch May 18, 2026 22:01
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
…NeMo#2983)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants