Introduce GDN to Mamba by Phlip79 · Pull Request #3535 · NVIDIA/Megatron-LM

Phlip79 · 2026-02-23T06:20:11Z

What does this PR do ?

Adding Gated Delta Net (GDN) feature to MambaModel. GDN was added to GPTModel in #1989.

Adding unit tests. Also, tested end-to-end model using this Megatron-Bridge PR: NVIDIA-NeMo/Megatron-Bridge#2520.

wandb: Baseline (w/ GPTModel) vs MambaModel

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

copy-pr-bot · 2026-02-23T06:20:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Phlip79 · 2026-02-23T17:32:56Z

/ok to test acc49e7

Phlip79 · 2026-02-25T02:40:45Z

/ok to test 2fb3d8d

Phlip79 · 2026-02-25T02:46:44Z

/ok to test a2cfc0e

yuzhongw-nvidia

LGTM. Thanks.

duncanriach · 2026-02-26T19:33:34Z

Looking good.

Please merge after 3377, which is very close to being merged.

…lip/mamba-gdn

The merge auto-resolved mamba_block.py but kept the old layer_number=i+1 style in the GDN case. Update to use layer_number (which includes pp_layer_offset), add_layer_offset=False, and pp_layer_offset to match the ATTENTION case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-02-27T20:24:52Z

/ok to test 6b9a8f7

duncanriach

These are the things that stand-out from the review of your latest changes. I get the sense that there are other things that will need to be added to fully support GDN, such as the calculation of flops in training.py

Just realized I had some comments that were pending from yesterday that I forgot to submit. Submitted with this review cycle.

duncanriach · 2026-02-27T20:54:40Z

+                        layer_number=layer_number,
+                        pg_collection=pg_collection,
+                        is_mtp_layer=is_mtp_layer,
+                        add_layer_offset=False,


Looking at the current GDN implementation in main, it doesn't seem to add its own pp_layer_offset like the attention layers do. I wonder if this is because it doesn't currently support pipeline parallel via the GPTModel. If it's not adding its own pp_layer_offset, then it doesn't need to be told not to when the HLModel passes a layer_number containing the pp_layer_offset it already calculated.

pp_layer_offset is not needed for GDN (for now).

This code should be moved up under mamba.

Sounds like add_layer_offset is not needed, because the GDN layer does not currently add its own pp_layer_offset. So perhaps add_layer_offset should not be included here.

I believe we should be setting add_layer_offset=False so that the TransformerLayer does not calculate additional layers (see this code).

Phlip79 · 2026-03-18T18:03:11Z

/ok to test bd9317e

jaredcasper · 2026-03-19T19:53:39Z

Can we add a functional test or two for GDN? Or do we need more to do an end-to-end run?

svcnvidia-nemo-ci · 2026-03-20T22:49:50Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23365696949

Phlip79 · 2026-03-20T23:38:53Z

/ok to test 5a5e206

svcnvidia-nemo-ci · 2026-03-22T23:05:58Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23414703144

Phlip79 · 2026-03-22T23:48:05Z

/ok to test cb665a7

svcnvidia-nemo-ci · 2026-03-23T05:36:59Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23423090252

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduce GDN to Mamba

acc49e7

svcnvidia-nemo-ci added this to the Core 0.16 milestone Feb 23, 2026

Phlip79 mentioned this pull request Feb 25, 2026

Switch Qwen3-Next to use MambaModel NVIDIA-NeMo/Megatron-Bridge#2520

Draft

5 tasks

Fix linting

2fb3d8d

Fix import order

a2cfc0e

copy-pr-bot Bot temporarily deployed to test February 25, 2026 02:47 Inactive

Phlip79 marked this pull request as ready for review February 25, 2026 02:49

Phlip79 requested review from a team as code owners February 25, 2026 02:49

svcnvidia-nemo-ci requested a review from a team February 25, 2026 02:49

Phlip79 requested review from duncanriach, janEbert and yuzhongw-nvidia and removed request for a team February 25, 2026 02:49

Phlip79 added the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Feb 26, 2026

yuzhongw-nvidia approved these changes Feb 26, 2026

View reviewed changes

janEbert approved these changes Feb 26, 2026

View reviewed changes

Comment thread megatron/core/ssm/mamba_hybrid_layer_allocation.py

Phlip79 and others added 2 commits February 27, 2026 20:21

Merge branch 'main' of https://github.com/NVIDIA/Megatron-LM into phi…

4057f9c

…lip/mamba-gdn

copy-pr-bot Bot temporarily deployed to test February 27, 2026 20:25 Inactive

duncanriach reviewed Feb 27, 2026

View reviewed changes

Comment thread megatron/core/ssm/mamba_hybrid_layer_allocation.py

maanug-nv approved these changes Mar 17, 2026

View reviewed changes

Phlip79 requested a review from a team March 17, 2026 21:54

copy-pr-bot Bot temporarily deployed to test March 18, 2026 19:04 Inactive

jaredcasper approved these changes Mar 19, 2026

View reviewed changes

svcnvidia-nemo-ci added the Approved All necessary approvals have been made label Mar 19, 2026

Phlip79 enabled auto-merge March 20, 2026 22:49

Phlip79 added this pull request to the merge queue Mar 20, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 20, 2026

Merge branch 'main' into philip/mamba-gdn

5a5e206

Phlip79 enabled auto-merge March 20, 2026 23:38

copy-pr-bot Bot temporarily deployed to test March 20, 2026 23:39 Inactive

Phlip79 disabled auto-merge March 21, 2026 00:19

Phlip79 added this pull request to the merge queue Mar 22, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 22, 2026

Merge branch 'main' into philip/mamba-gdn

cb665a7

Phlip79 enabled auto-merge March 22, 2026 23:48

copy-pr-bot Bot temporarily deployed to test March 22, 2026 23:48 Inactive

Phlip79 added this pull request to the merge queue Mar 23, 2026

Merged via the queue into NVIDIA:main with commit 8f7fbe7 Mar 23, 2026
99 of 104 checks passed

Phlip79 deleted the philip/mamba-gdn branch March 23, 2026 06:05

yangbofun pushed a commit to xlm-research/Megatron-LM that referenced this pull request May 22, 2026

Introduce GDN to Mamba (NVIDIA#3535)

94c2846

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sbhavani mentioned this pull request May 26, 2026

[ROADMAP][2026 Q2] Megatron Core Roadmap #4997

Open

Conversation

Phlip79 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot Bot commented Feb 23, 2026

Uh oh!

Phlip79 commented Feb 23, 2026

Uh oh!

Phlip79 commented Feb 25, 2026

Uh oh!

Phlip79 commented Feb 25, 2026

Uh oh!

yuzhongw-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

duncanriach commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Phlip79 commented Feb 27, 2026

Uh oh!

duncanriach left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

duncanriach Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

yuzhongw-nvidia Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

duncanriach Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Phlip79 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Phlip79 commented Mar 18, 2026

Uh oh!

jaredcasper commented Mar 19, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 20, 2026

Uh oh!

Uh oh!

Phlip79 commented Mar 20, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 22, 2026

Uh oh!

Uh oh!

Phlip79 commented Mar 22, 2026

Uh oh!

svcnvidia-nemo-ci commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Phlip79 commented Feb 23, 2026 •

edited

Loading

(Step 1): Add PR label `Expert Review`

duncanriach commented Feb 26, 2026 •

edited

Loading

duncanriach left a comment •

edited

Loading