Skip to content

[model] feat: Add stepfun-ai/Step-3.5-Flash bridge#3525

Merged
yaoyu-33 merged 16 commits into
NVIDIA-NeMo:mainfrom
shifangx:shifang/step-3.5-flash
May 22, 2026
Merged

[model] feat: Add stepfun-ai/Step-3.5-Flash bridge#3525
yaoyu-33 merged 16 commits into
NVIDIA-NeMo:mainfrom
shifangx:shifang/step-3.5-flash

Conversation

@shifangx

@shifangx shifangx commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

…Bridge support

What does this PR do ?

Add support for Step-3.5-Flash, a 196B sparse MoE model (~11B active parameters) with the custom step3p5 architecture.

Architecture features:

  • 45 layers: 3 dense (0-2) + 42 MoE (3-44)
  • 288 routed experts + 1 shared expert, top-8 sigmoid routing with expert bias
  • Zero-centered RMSNorm + per-head QK RMSNorm
  • Per-head attention gate (g_proj)
  • Per-layer RoPE theta (48-element array)
  • Alternating SWA(Sliding Window Attention)
  • Specific attention config for SWA, such as num_attention_heads, rotary_base and rotary_percent.

This pr depends on NVIDIA/Megatron-LM#4841 and NVIDIA/Megatron-LM#4473

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot

copy-pr-bot Bot commented Apr 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shifangx shifangx changed the title [model] feat: Add stepfun-ai/Step-3.5-Flash bridge [draft] [model] feat: Add stepfun-ai/Step-3.5-Flash bridge Apr 26, 2026
@shifangx shifangx force-pushed the shifang/step-3.5-flash branch 3 times, most recently from ee5cce6 to 690edcd Compare April 26, 2026 15:18
@shifangx shifangx changed the title [draft] [model] feat: Add stepfun-ai/Step-3.5-Flash bridge [model] feat: Add stepfun-ai/Step-3.5-Flash bridge Apr 27, 2026
@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work high-complexity Harder to merge: prone to conflicts and needs additional test coverage needs-review PR is ready for code review and waiting on a reviewer labels Apr 27, 2026
Comment thread src/megatron/bridge/models/step/__init__.py Outdated
@yaoyu-33

Copy link
Copy Markdown
Contributor

/claude review

Comment thread .gitmodules
Comment thread .gitmodules Outdated
Comment thread src/megatron/bridge/models/step/modeling_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/modeling_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/modeling_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/modeling_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/modeling_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/modeling_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/step3p5_bridge.py Outdated
Comment thread src/megatron/bridge/models/step/step3p5_bridge.py Outdated
Comment thread src/megatron/bridge/recipes/step/__init__.py Outdated
Comment thread src/megatron/bridge/recipes/stepfun/step35.py
Comment thread src/megatron/bridge/models/step/configuration_step3p5.py Outdated
@yaoyu-33

Copy link
Copy Markdown
Contributor

Naming consistency review

The repo's established pattern for "X.Y"-versioned models drops the dot in filenames/symbols and uses an underscore in HF model_type strings. This PR uses p as a decimal separator everywhere, which matches no other model in tree:

Model Filenames / classes HF model_type
Qwen 2.5 qwen25_vl_bridge.py, Qwen25... "qwen2_5" / "qwen2_5_vl"
Qwen 3.5 qwen35_vl_bridge.py, Qwen35... "qwen3_5" / "qwen3_5_moe"
GLM 4.5v glm_45v_bridge.py, modeling_glm_45v.py
Kimi K2.5 kimi_k25_vl/, modeling_kimi_k25_vl.py
Bailing MoE v2 bailing_moe_v2, BailingMoeV2... "bailing_moe_v2"
Step 3.5 (this PR) step3p5_bridge.py, Step3p5... "step3p5"

Suggested renames

  • step3p5_bridge.pystep35_bridge.py
  • configuration_step3p5.pyconfiguration_step35.py
  • modeling_step3p5.pymodeling_step35.py
  • recipes/step/step3p5.pyrecipes/step/step35.py
  • Classes: Step3p5Bridge, Step3p5Config, Step3p5ForCausalLM, Step3p5Model, Step3p5RotaryEmbedding, Step3p5CausalLMOutputWithPast → drop the p
  • Recipe fn: step3p5_196b_a11b_pretrain_configstep35_196b_a11b_pretrain_config
  • Bridge registration model_type="step3p5""step3_5" (matching the qwen3_5 style)

Caveat on the HF model_type string: if "step3p5" is what stepfun-ai uses on HuggingFace Hub (the value baked into their published config.json), it has to stay as-is for from_pretrained to dispatch — in that case only the file/class names should change, and it'd be worth a docstring note explaining why the string differs from the rest of the repo.

Header / boilerplate

  • configuration_step3p5.py has no copyright header — file starts at from typing import …. Per CLAUDE.md, NVIDIA headers are required on new non-test files.
  • modeling_step3p5.py carries # Copyright 2025 The LLAMA4 and HuggingFace Inc. team, looks like a Llama4 template was used. Compare with modeling_bailing_moe_v2.py / modeling_ministral3.py, which start with the standard NVIDIA + upstream-attribution header.
  • Both files use year 2025; please bump to 2026.

Submodule bump

The PR also bumps 3rdparty/Megatron-LM and .gitmodules (commits 49bff34d, 690edcd9, 39828dd8). If that bump isn't strictly required by the Step bridge code, please split it into a separate PR so this one is purely the model addition.

Layout

models/step/{__init__.py, <name>_bridge.py, configuration_*.py, modeling_*.py} matches the precedent set by bailing/ and ministral3/. No <name>_provider.py needed since the bridge subclasses GPTModelProvider directly. ✓

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR has several issues that need to be addressed before it's ready for merge. The main concerns:

Blockers

  1. .gitmodules points to a personal fork — The submodule URL was changed from https://github.com/NVIDIA/Megatron-LM.git to git@github.com:shifangx/Megatron-LM.git. This will break CI and all other contributors. Must be reverted.

  2. self.attention_dropout undefinedStep3p5Attention references self.attention_dropout during training but never initializes it. This is an AttributeError at runtime.

  3. get_input_embeddings breaks HF API — The override in Step3p5Model takes input_ids and returns embeddings output instead of the embedding module. This breaks tie_weights(), get_input_embeddings() calls from Step3p5ForCausalLM, etc.

  4. Missing bridge parameter mappings — The mapping_registry is missing mappings for shared experts, dense-layer MLPs (layers 0–2), attention gate (g_proj), and router bias. Weight conversion will silently drop or fail on these parameters.

Cleanup

  • Multiple copy-paste artifacts from Llama4/Qwen3 (copyright header, docstring examples, comments)
  • Leftover # breakpoint() debug comments
  • share_expert_dims / share_expert_dim naming inconsistency in config

Missing

  • No tests. A new model bridge should have at least unit tests for parameter mapping round-trips and config conversion.

Comment thread src/megatron/bridge/models/step/configuration_step3p5.py Outdated

@yaoyu-33 yaoyu-33 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline notes for the naming-consistency review (top-level summary already posted above).

Comment thread src/megatron/bridge/models/step/configuration_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/modeling_step3p5.py Outdated
Comment thread src/megatron/bridge/models/step/step3p5_bridge.py Outdated
Comment thread src/megatron/bridge/models/step/step3p5_bridge.py Outdated
Comment thread src/megatron/bridge/models/step/step3p5_bridge.py Outdated
Comment thread .gitmodules
Comment thread src/megatron/bridge/models/step/configuration_step3p5.py Outdated
@yaoyu-33

Copy link
Copy Markdown
Contributor

Addressed the unresolved review feedback in 5e5c2c7681d3cf91ef5dd0c3b0c68ea4d56e0a11:

  • Moved the model, recipe, example, and test paths to stepfun.
  • Removed the redundant megatron_to_hf_config() override.
  • Routed the Stepfun HF config fields through CONFIG_MAPPING / super().provider_bridge() and kept only Stepfun-specific normalization in provider_bridge().
  • Added functional conversion coverage for TP/PP/EP toy Step-3.5 checkpoints.
  • Added a 1-node / 8-GPU inference example for Step-3.5-Flash.

Sanitized internal validation evidence:

  • Unit coverage for Stepfun model/recipe changes: 46 passed.
  • Functional conversion coverage for Stepfun TP/PP/EP cases: 3 passed.
  • Real stepfun-ai/Step-3.5-Flash smoke run: local public-model snapshot, 1 node / 8 GPUs, in-memory HF -> Megatron conversion, EP=8, greedy generation completed.
    • Prompt: Write one concise sentence about Megatron Bridge.
    • Generated: <|begin▁of▁sentence|>Write one concise sentence about Megatron Bridge. Write one concise sentence

Posted /ok to test 5e5c2c7681d3cf91ef5dd0c3b0c68ea4d56e0a11 separately.

@yaoyu-33 yaoyu-33 added the needs-more-tests Requires additional L0 and L1 test coverage before merge label May 22, 2026
@yaoyu-33 yaoyu-33 removed the help wanted Extra attention is needed label May 22, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test 32d94b8

yaoyu-33
yaoyu-33 previously approved these changes May 22, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test 9ab6119

@yaoyu-33 yaoyu-33 enabled auto-merge (squash) May 22, 2026 01:06
@yaoyu-33 yaoyu-33 merged commit e4aabf3 into NVIDIA-NeMo:main May 22, 2026
133 checks passed
@shifangx shifangx deleted the shifang/step-3.5-flash branch May 25, 2026 10:47
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-more-tests Requires additional L0 and L1 test coverage before merge needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants