Add Dynamic Context Parallelism support (port from dev) by ilml · Pull Request #5252 · NVIDIA/Megatron-LM

ilml · 2026-06-10T00:40:40Z

Summary

Port the dynamic context parallelism feature (--dynamic-context-parallel) from the dev branch to main. The feature works on dev but is missing/incomplete on main; this PR brings main to parity.

Covers the following dev PRs:

[Dev] Add E2E support for THD format #2924 — THD E2E sequence-packing framework
[Dev] Fix for rope when enabling THD + Dynamic-CP; and use the naming Dynamic-CP. #3405 — THD+DCP rope fix, hybrid→dynamic rename
[Dev] feat: Dynamic CP (part 2) #2000 — Dynamic CP part 2
Minor improvements for Dynamic-cp #4226 — resolve_cp_group, MTP token-weighted loss logging, GDN/MLA enablement
varlendataset for thd e2e and benchmark #4832 — VarlenDataset and dataloader contract fix

plus the dev-side wrap_data_iterator and arg-rename fixes.

Core feature files (data_schedule.py, data_schedule_utils.py, varlen_dataset.py) are byte-identical to dev.

Main-side adaptations

pretrain_gpt.py keeps main's consolidated get_batch/forward_step and adds a sequence-packing branch on top.
gated_delta_net.py threads the per-microbatch resolved cp_group through main's fused head-perm all-to-all path.
multi_token_prediction.py re-implements the loss-sum/token-count tracker inside main's metrics API, keeping acceptance-rate logging.
The deprecated --hybrid-context-parallel flag maps to --dynamic-context-parallel via ModelParallelConfig.__post_init__.

Testing

All changed Python files pass syntax/compile checks; unit and functional tests have not been run locally — relying on CI.
Adds functional test case gpt3_mcore_te_tp2_pp1_cp4_dcp and unit tests (test_sequence_packing.py, test_varlen_dataset.py, test_get_batch.py, etc.) ported from dev.

🤖 Generated with Claude Code

Port the dynamic CP (--dynamic-context-parallel) feature from the dev branch to main, covering dev PRs NVIDIA#2924 (THD E2E sequence-packing framework), NVIDIA#3405 (THD+DCP rope fix, hybrid->dynamic rename), NVIDIA#2000 (Dynamic CP part 2), NVIDIA#4226 (resolve_cp_group, MTP token-weighted loss logging, GDN/MLA enablement), NVIDIA#4832 (VarlenDataset and dataloader contract fix), plus the dev-side wrap_data_iterator and arg-rename fixes. Main-side adaptations: - pretrain_gpt.py keeps main's consolidated get_batch/forward_step and adds a sequence-packing branch on top. - gated_delta_net.py threads the per-microbatch resolved cp_group through main's fused head-perm all-to-all path. - multi_token_prediction.py re-implements the loss-sum/token-count tracker inside main's metrics API, keeping acceptance-rate logging. - The deprecated --hybrid-context-parallel flag maps to --dynamic-context-parallel via ModelParallelConfig.__post_init__. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

copy-pr-bot · 2026-06-10T00:40:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ilml added the Run functional tests label Jun 10, 2026

Victarry mentioned this pull request Jun 10, 2026

[ROADMAP][2026 Q2] Megatron Core MoE Roadmap #4815

Open

71 tasks

ilml closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dynamic Context Parallelism support (port from dev)#5252

Add Dynamic Context Parallelism support (port from dev)#5252
ilml wants to merge 1 commit into
NVIDIA:mainfrom
ilml:dcp/sync-dynamic-cp-to-main

ilml commented Jun 10, 2026

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ilml commented Jun 10, 2026

Summary

Main-side adaptations

Testing

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant