Skip to content

[Main] Partial CUDA Graph support for EP Overlap#2184

Merged
ericharper merged 20 commits into
NVIDIA:mainfrom
Wohox:pingtian/support_cuda_graph_for_ep_overlap_main
Jan 16, 2026
Merged

[Main] Partial CUDA Graph support for EP Overlap#2184
ericharper merged 20 commits into
NVIDIA:mainfrom
Wohox:pingtian/support_cuda_graph_for_ep_overlap_main

Conversation

@Wohox

@Wohox Wohox commented Nov 10, 2025

Copy link
Copy Markdown
Contributor

Based on #1920

What does this PR do ?

EP Overlap brings extra cpu overhead and may cause GPU bubble during execution, partial CUDA graph helps release cpu pressure within the selected scope. This PR supports partial CUDA graph for EP Overlap, the supported scopes are attn, moe_router, moe_preprocess (moe and mlp are not supported).

Usage
To enable this feature, refer to the following example:

--overlap-moe-expert-parallel-comm \
--delay-wgrad-compute \
--cuda-graph-scope attn moe_router moe_preprocess \

(--delay-wgrad-compute is optional)

Correctness
The loss value can be bitwise aligned when enable & disable partial CUDA graph.
Screenshot 2025-11-07 at 11 55 01

This PR is for main branch and the PR for dev branch is in PR2168

Dependencies
enable CUDA graph with delay-wgrad-compute relies on TE PR:

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

@Wohox Wohox requested review from a team as code owners November 10, 2025 08:57
@copy-pr-bot

copy-pr-bot Bot commented Nov 10, 2025

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Wohox Wohox changed the title Pingtian/support cuda graph for ep overlap main [Draft][Main] Partial CUDA Graph support for EP Overlap Nov 10, 2025
@Wohox Wohox requested review from a team as code owners November 13, 2025 09:07
@Wohox Wohox changed the title [Draft][Main] Partial CUDA Graph support for EP Overlap [Main] Partial CUDA Graph support for EP Overlap Nov 13, 2025
@kvareddy

Copy link
Copy Markdown
Contributor

@fanshiqing @jiemingz can you please take a look at this MR?

@jiemingz jiemingz self-assigned this Nov 13, 2025
@jiemingz

jiemingz commented Nov 13, 2025

Copy link
Copy Markdown
Contributor

it looks like this include changes from #1920 , is that supposed to be merged first and this be rebased?

@Wohox Wohox added module: moe Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. labels Nov 17, 2025
@Wohox

Wohox commented Nov 17, 2025

Copy link
Copy Markdown
Contributor Author

it looks like this include changes from #1920 , is that supposed to be merged first and this be rebased?

@jiemingz Yes, but rebase can happen later, since this MR requires #1920.

@Wohox

Wohox commented Dec 2, 2025

Copy link
Copy Markdown
Contributor Author

/ok to test 125fa43

Comment thread megatron/core/transformer/transformer_layer.py
Comment thread megatron/core/transformer/transformer_layer.py Outdated
Comment thread megatron/core/transformer/module.py Outdated

@lhb8125 lhb8125 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@jiemingz jiemingz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Wohox

Wohox commented Jan 15, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 9eababb

@Wohox

Wohox commented Jan 15, 2026

Copy link
Copy Markdown
Contributor Author

@ericharper Can you help review this PR, still needs approval from NeMo and GPT group, thanks~

@ericharper ericharper enabled auto-merge January 15, 2026 18:22
@Phlip79

Phlip79 commented Jan 15, 2026

Copy link
Copy Markdown
Member

/ok to test ecbaff4

@Phlip79

Phlip79 commented Jan 15, 2026

Copy link
Copy Markdown
Member

/ok to test c3219e6

@Phlip79

Phlip79 commented Jan 15, 2026

Copy link
Copy Markdown
Member

/ok to test e01fcab

@Wohox

Wohox commented Jan 16, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test e180d4d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

complexity: high dev2main: mbridge dev to main: this PR is needed in main for mbridge Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. Final Review PR is in the "final review" stage module: moe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants