Skip to content

Support multimodule pipelining in 1F1B schedule#3129

Merged
shifangx merged 24 commits into
NVIDIA:mainfrom
yashaswikarnati:yash/1f1b_changes
Mar 18, 2026
Merged

Support multimodule pipelining in 1F1B schedule#3129
shifangx merged 24 commits into
NVIDIA:mainfrom
yashaswikarnati:yash/1f1b_changes

Conversation

@yashaswikarnati

@yashaswikarnati yashaswikarnati commented Jan 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds support for multi-module pipeline parallelism (encoder + LLM) in the 1F1B schedule.

Changes:

  • Add MultiModuleProcessGroupCollection for managing process groups across modules
  • Support dict-based tensor format {module_name: tensor} in forward/backward
  • Handle 2D/3D tensor conversion for P2P and bridge communication
  • Add backward_step_multimodule to handle backward for multimodule cases

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

- Rename ProcessGroupCollectionWrapper to MultiModuleProcessGroupCollection
- Rename language_model field to language_model_module_name for clarity
- Add language_model_module_name param to backward_step_multimodule
- Use functools.partial to bind param, keeping signature consistent
- Add type hints to _ensure_3d_tensor and _restore_tensor_shape
- Move is_multimodule check earlier for validation and backward selection
@yashaswikarnati yashaswikarnati requested review from a team as code owners January 28, 2026 22:53
@copy-pr-bot

copy-pr-bot Bot commented Jan 28, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ko3n1g ko3n1g requested a review from a team January 28, 2026 22:54
@dimapihtar dimapihtar added complexity: high Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. labels Jan 29, 2026
@dimapihtar

Copy link
Copy Markdown
Contributor

/ok to test 2d7c176

Comment thread megatron/core/pipeline_parallel/bridge_communicator.py
Comment thread megatron/core/pipeline_parallel/multimodule_communicator.py
Comment thread megatron/core/pipeline_parallel/schedules.py
@dimapihtar dimapihtar requested a review from erhoo82 February 4, 2026 15:10
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Mar 13, 2026
@shifangx

Copy link
Copy Markdown
Contributor

/ok to test 6542743

yashaswikarnati and others added 3 commits March 13, 2026 17:42
Replace num_warmup_microbatches property on P2PCommunicator and
MultiModulePipelineCommunicator with total_stages and current_stage
properties. Compute num_warmup_microbatches in schedules.py instead.

Addresses review feedback from jaredcasper on PR NVIDIA#3129.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	megatron/core/pipeline_parallel/schedules.py
… tests

- Add HyperCommGrid.destroy() and BridgeCommunicator.destroy_broadcast_pgs()
  to clean up PGs created during tests
- Add expt_dp grid dimension and cache embedding PGs to prevent
  creation of undestroyed PGs in DDP init and add_embedding_groups
- Reuse pg_collection across finalize_model_grads calls instead of
  rebuilding from scratch each iteration
- Add teardown_method to bridge/communicator/schedules test classes
@yashaswikarnati

Copy link
Copy Markdown
Contributor Author

/ok to test edc8159

embd, pos_embd, pp, dp_cp are already validated in finalize_model_grads.
Only tp and cp are directly used in the schedule functions.
@yashaswikarnati

Copy link
Copy Markdown
Contributor Author

/ok to test 92b65d1

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@yashaswikarnati

Copy link
Copy Markdown
Contributor Author

/ok to test 78ee58c

@svcnvidia-nemo-ci svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Mar 17, 2026
@shifangx shifangx added this pull request to the merge queue Mar 18, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23229957578

Merged via the queue into NVIDIA:main with commit 0ca9b63 Mar 18, 2026
55 of 57 checks passed
ilml added a commit to ilml/Megatron-LM that referenced this pull request Mar 20, 2026
…edule (NVIDIA#3129)

New files:
  - tests/unit_tests/pipeline_parallel/test_multimodule_schedules.py
ilml added a commit to ilml/Megatron-LM that referenced this pull request Mar 20, 2026
These test files import from existing modules that are modified in Phase 2:
- test_rmsnorm_residual_fusion.py: imports TEFusedResidualRMSNorm (added in NVIDIA#3384)
- test_mup.py: imports get_mup_config_overrides (added in NVIDIA#3058)
- test_multimodule_schedules.py: imports MultiModuleProcessGroupCollection (added in NVIDIA#3129)

They will be re-added in Phase 2 when the corresponding code changes land.

Made-with: Cursor
copy-pr-bot Bot pushed a commit that referenced this pull request Mar 24, 2026
Resolve merge conflicts in 8 files after syncing fork with upstream
(including merged #3129). Take main's improvements for multimodule
communicator, p2p_communication, process_groups_config, and schedule
tests. Keep both non-colocated PP changes and main's new features.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yangbofun pushed a commit to xlm-research/Megatron-LM that referenced this pull request May 22, 2026
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Shifang Xu <shifangx@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: high Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. help wanted Extra attention is needed needs-follow-up Issue needs follow-up

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants