[model] feat: Add stepfun-ai/Step-3.5-Flash bridge#3525
Conversation
ee5cce6 to
690edcd
Compare
|
/claude review |
Naming consistency reviewThe repo's established pattern for "X.Y"-versioned models drops the dot in filenames/symbols and uses an underscore in HF
Suggested renames
Caveat on the HF Header / boilerplate
Submodule bumpThe PR also bumps Layout
|
There was a problem hiding this comment.
Review Summary
This PR has several issues that need to be addressed before it's ready for merge. The main concerns:
Blockers
-
.gitmodulespoints to a personal fork — The submodule URL was changed fromhttps://github.com/NVIDIA/Megatron-LM.gittogit@github.com:shifangx/Megatron-LM.git. This will break CI and all other contributors. Must be reverted. -
self.attention_dropoutundefined —Step3p5Attentionreferencesself.attention_dropoutduring training but never initializes it. This is anAttributeErrorat runtime. -
get_input_embeddingsbreaks HF API — The override inStep3p5Modeltakesinput_idsand returns embeddings output instead of the embedding module. This breakstie_weights(),get_input_embeddings()calls fromStep3p5ForCausalLM, etc. -
Missing bridge parameter mappings — The
mapping_registryis missing mappings for shared experts, dense-layer MLPs (layers 0–2), attention gate (g_proj), and router bias. Weight conversion will silently drop or fail on these parameters.
Cleanup
- Multiple copy-paste artifacts from Llama4/Qwen3 (copyright header, docstring examples, comments)
- Leftover
# breakpoint()debug comments share_expert_dims/share_expert_dimnaming inconsistency in config
Missing
- No tests. A new model bridge should have at least unit tests for parameter mapping round-trips and config conversion.
yaoyu-33
left a comment
There was a problem hiding this comment.
Inline notes for the naming-consistency review (top-level summary already posted above).
|
Addressed the unresolved review feedback in
Sanitized internal validation evidence:
Posted |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 32d94b8 |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 9ab6119 |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
…Bridge support
What does this PR do ?
Add support for Step-3.5-Flash, a 196B sparse MoE model (~11B active parameters) with the custom step3p5 architecture.
Architecture features:
This pr depends on NVIDIA/Megatron-LM#4841 and NVIDIA/Megatron-LM#4473
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information