Skip to content

Introduce GDN to Mamba#3535

Merged
Phlip79 merged 25 commits into
NVIDIA:mainfrom
Phlip79:philip/mamba-gdn
Mar 23, 2026
Merged

Introduce GDN to Mamba#3535
Phlip79 merged 25 commits into
NVIDIA:mainfrom
Phlip79:philip/mamba-gdn

Conversation

@Phlip79

@Phlip79 Phlip79 commented Feb 23, 2026

Copy link
Copy Markdown
Member

What does this PR do ?

Adding Gated Delta Net (GDN) feature to MambaModel. GDN was added to GPTModel in #1989.

Adding unit tests. Also, tested end-to-end model using this Megatron-Bridge PR: NVIDIA-NeMo/Megatron-Bridge#2520.

wandb: Baseline (w/ GPTModel) vs MambaModel

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

@copy-pr-bot

copy-pr-bot Bot commented Feb 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Phlip79

Phlip79 commented Feb 23, 2026

Copy link
Copy Markdown
Member Author

/ok to test acc49e7

@Phlip79

Phlip79 commented Feb 25, 2026

Copy link
Copy Markdown
Member Author

/ok to test 2fb3d8d

@Phlip79

Phlip79 commented Feb 25, 2026

Copy link
Copy Markdown
Member Author

/ok to test a2cfc0e

@Phlip79 Phlip79 marked this pull request as ready for review February 25, 2026 02:49
@Phlip79 Phlip79 requested review from a team as code owners February 25, 2026 02:49
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team February 25, 2026 02:49
@Phlip79 Phlip79 requested review from duncanriach, janEbert and yuzhongw-nvidia and removed request for a team February 25, 2026 02:49
@Phlip79 Phlip79 added the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Feb 26, 2026

@yuzhongw-nvidia yuzhongw-nvidia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Comment thread megatron/core/ssm/mamba_hybrid_layer_allocation.py
@duncanriach

duncanriach commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

Looking good.

Please merge after 3377, which is very close to being merged.

Phlip79 and others added 2 commits February 27, 2026 20:21
The merge auto-resolved mamba_block.py but kept the old
layer_number=i+1 style in the GDN case. Update to use layer_number
(which includes pp_layer_offset), add_layer_offset=False, and
pp_layer_offset to match the ATTENTION case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Phlip79

Phlip79 commented Feb 27, 2026

Copy link
Copy Markdown
Member Author

/ok to test 6b9a8f7

@duncanriach duncanriach left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the things that stand-out from the review of your latest changes. I get the sense that there are other things that will need to be added to fully support GDN, such as the calculation of flops in training.py

Just realized I had some comments that were pending from yesterday that I forgot to submit. Submitted with this review cycle.

Comment thread megatron/core/ssm/mamba_hybrid_layer_allocation.py Outdated
Comment thread megatron/core/ssm/mamba_block.py Outdated
layer_number=layer_number,
pg_collection=pg_collection,
is_mtp_layer=is_mtp_layer,
add_layer_offset=False,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the current GDN implementation in main, it doesn't seem to add its own pp_layer_offset like the attention layers do. I wonder if this is because it doesn't currently support pipeline parallel via the GPTModel. If it's not adding its own pp_layer_offset, then it doesn't need to be told not to when the HLModel passes a layer_number containing the pp_layer_offset it already calculated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pp_layer_offset is not needed for GDN (for now).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should be moved up under mamba.

Sounds like add_layer_offset is not needed, because the GDN layer does not currently add its own pp_layer_offset. So perhaps add_layer_offset should not be included here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should be setting add_layer_offset=False so that the TransformerLayer does not calculate additional layers (see this code).

Comment thread megatron/core/ssm/mamba_hybrid_layer_allocation.py
@Phlip79 Phlip79 requested a review from a team March 17, 2026 21:54
@Phlip79

Phlip79 commented Mar 18, 2026

Copy link
Copy Markdown
Member Author

/ok to test bd9317e

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Approved All necessary approvals have been made label Mar 19, 2026
@jaredcasper

Copy link
Copy Markdown
Contributor

Can we add a functional test or two for GDN? Or do we need more to do an end-to-end run?

@Phlip79 Phlip79 enabled auto-merge March 20, 2026 22:49
@Phlip79 Phlip79 added this pull request to the merge queue Mar 20, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23365696949

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 20, 2026
@Phlip79

Phlip79 commented Mar 20, 2026

Copy link
Copy Markdown
Member Author

/ok to test 5a5e206

@Phlip79 Phlip79 enabled auto-merge March 20, 2026 23:38
@Phlip79 Phlip79 disabled auto-merge March 21, 2026 00:19
@Phlip79 Phlip79 added this pull request to the merge queue Mar 22, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23414703144

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 22, 2026
@Phlip79

Phlip79 commented Mar 22, 2026

Copy link
Copy Markdown
Member Author

/ok to test cb665a7

@Phlip79 Phlip79 enabled auto-merge March 22, 2026 23:48
@Phlip79 Phlip79 added this pull request to the merge queue Mar 23, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23423090252

Merged via the queue into NVIDIA:main with commit 8f7fbe7 Mar 23, 2026
99 of 104 checks passed
@Phlip79 Phlip79 deleted the philip/mamba-gdn branch March 23, 2026 06:05
yangbofun pushed a commit to xlm-research/Megatron-LM that referenced this pull request May 22, 2026
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants