Add Dynamic Context Parallelism support (port from dev)#5252
Closed
ilml wants to merge 1 commit into
Closed
Conversation
Port the dynamic CP (--dynamic-context-parallel) feature from the dev branch to main, covering dev PRs NVIDIA#2924 (THD E2E sequence-packing framework), NVIDIA#3405 (THD+DCP rope fix, hybrid->dynamic rename), NVIDIA#2000 (Dynamic CP part 2), NVIDIA#4226 (resolve_cp_group, MTP token-weighted loss logging, GDN/MLA enablement), NVIDIA#4832 (VarlenDataset and dataloader contract fix), plus the dev-side wrap_data_iterator and arg-rename fixes. Main-side adaptations: - pretrain_gpt.py keeps main's consolidated get_batch/forward_step and adds a sequence-packing branch on top. - gated_delta_net.py threads the per-microbatch resolved cp_group through main's fused head-perm all-to-all path. - multi_token_prediction.py re-implements the loss-sum/token-count tracker inside main's metrics API, keeping acceptance-rate logging. - The deprecated --hybrid-context-parallel flag maps to --dynamic-context-parallel via ModelParallelConfig.__post_init__. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
71 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Port the dynamic context parallelism feature (
--dynamic-context-parallel) from thedevbranch tomain. The feature works ondevbut is missing/incomplete onmain; this PR bringsmainto parity.Covers the following dev PRs:
resolve_cp_group, MTP token-weighted loss logging, GDN/MLA enablementVarlenDatasetand dataloader contract fixplus the dev-side
wrap_data_iteratorand arg-rename fixes.Core feature files (
data_schedule.py,data_schedule_utils.py,varlen_dataset.py) are byte-identical todev.Main-side adaptations
pretrain_gpt.pykeeps main's consolidatedget_batch/forward_stepand adds a sequence-packing branch on top.gated_delta_net.pythreads the per-microbatch resolvedcp_groupthrough main's fused head-perm all-to-all path.multi_token_prediction.pyre-implements the loss-sum/token-count tracker inside main's metrics API, keeping acceptance-rate logging.--hybrid-context-parallelflag maps to--dynamic-context-parallelviaModelParallelConfig.__post_init__.Testing
gpt3_mcore_te_tp2_pp1_cp4_dcpand unit tests (test_sequence_packing.py,test_varlen_dataset.py,test_get_batch.py, etc.) ported from dev.🤖 Generated with Claude Code