Skip to content

add permute fusion into hybrid ep#4089

Merged
Phlip79 merged 11 commits into
NVIDIA:mainfrom
Autumn1998:tongliu_hybridep_permute_fusion_main
Apr 28, 2026
Merged

add permute fusion into hybrid ep#4089
Phlip79 merged 11 commits into
NVIDIA:mainfrom
Autumn1998:tongliu_hybridep_permute_fusion_main

Conversation

@Autumn1998

Copy link
Copy Markdown
Contributor

What does this PR do ?

This PR introduce the new feature: fuse the permute/unpermute with the dispatch/combine into 1 kernel
This feature is provided by the hybrid-ep
related PR om dev: #4073

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

@Autumn1998 Autumn1998 requested review from a team as code owners April 1, 2026 05:17
@copy-pr-bot

copy-pr-bot Bot commented Apr 1, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft April 1, 2026 05:17
@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@Autumn1998 Autumn1998 marked this pull request as ready for review April 1, 2026 05:18
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team April 1, 2026 05:18
@erhoo82 erhoo82 added dev2main: mbridge dev to main: this PR is needed in main for mbridge 26.04 Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. labels Apr 2, 2026
@Phlip79 Phlip79 removed the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Apr 3, 2026

@Victarry Victarry left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Already reviewed in #4073

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Apr 7, 2026
moe_hybridep_num_sms: int = 16
"""Number of SMs to use for HybridEP. In pure NVL scenarios,
16 SMs can generally achieve good bandwidth."""
moe_hybridep_num_sms: Optional[int] = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; could we add the defaults used in the doc string here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not tend to do this, as the default number of SMs used on the hybrid EP side may change.

@svcnvidia-nemo-ci svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Apr 7, 2026
@yaox12 yaox12 enabled auto-merge April 8, 2026 02:12
@yaox12

yaox12 commented Apr 8, 2026

Copy link
Copy Markdown
Member

/ok to test 1f0733c

@svcnvidia-nemo-ci svcnvidia-nemo-ci added this to the Core 0.16 milestone Apr 8, 2026
@Autumn1998 Autumn1998 requested a review from a team as a code owner April 16, 2026 06:10
@svcnvidia-nemo-ci svcnvidia-nemo-ci removed the Approved All necessary approvals have been made label Apr 16, 2026
@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test e28bb0e

@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test e28bb0e

@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test d31d205

@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test d31d205

@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test d31d205

1 similar comment
@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test d31d205

@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test 022d94b

@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test 022d94b

1 similar comment
@Autumn1998

Copy link
Copy Markdown
Contributor Author

/ok to test 022d94b

@Victarry Victarry enabled auto-merge April 24, 2026 06:27
@Victarry

Copy link
Copy Markdown
Contributor

/ok to test 139778d

@yaox12 yaox12 requested a review from Phlip79 April 27, 2026 02:46
@Phlip79 Phlip79 disabled auto-merge April 28, 2026 00:46
@Phlip79 Phlip79 added this pull request to the merge queue Apr 28, 2026
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Approved All necessary approvals have been made label Apr 28, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25027627614

Merged via the queue into NVIDIA:main with commit 8c5cf05 Apr 28, 2026
68 of 69 checks passed
yangbofun pushed a commit to xlm-research/Megatron-LM that referenced this pull request May 22, 2026
Co-authored-by: root <root@eos0047.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0260.eos.clusters.nvidia.com>
Co-authored-by: Dennis(Zhenhuan) Liu <denliu@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

26.04 26.04.01 Approved All necessary approvals have been made complexity: low core_r0.17.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. dev2main: mbridge dev to main: this PR is needed in main for mbridge

Projects

None yet

Development

Successfully merging this pull request may close these issues.