Feature: Support of Manifold Hyper Connection(mHC)#3430
Closed
jingqiny-99 wants to merge 10 commits into
Closed
Conversation
a1d8ca5 to
1e66788
Compare
10 tasks
…onnection(mHC). (NVIDIA#2943) Co-authored-by: Jingqin Yang <jingqiny@login-eos01.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0478.eos.clusters.nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
e7e1a13 to
62ea06f
Compare
Author
|
/claude review |
This was referenced May 1, 2026
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 4, 2026
… manifold hyper connection Adds the core Manifold Hyper Connection (mHC) module and the supporting transformer-block / transformer-layer / config / recompute changes. This is the first split of NVIDIA#3430 (mirror of upstream NVIDIA#2943 + kernel-fusion NVIDIA#3828), covering only files owned by core-adlr / core-nemo / transformer / cuda-graphs reviewers. GPT model wiring, pipeline-parallel support, fused mHC cuTile kernels, and the functional-test recipe are deferred to follow-up split PRs. Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943, basic pytorch impl only — kernel fusion NVIDIA#3828 deferred). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 4, 2026
CI run #25129636936 (job tests/unit_tests/models/**/*.py) failed with "Mamba MoE config drift detected!" because PR NVIDIA#1 adds five new TransformerConfig fields (enable_hyper_connections, mhc_init_gating_factor, mhc_recompute_layer_num, mhc_sinkhorn_iterations, num_residual_streams) but this PR previously deferred the corresponding test_mamba_moe_model.py GOLDEN_CONFIG update (the file has since been renamed to test_hybrid_moe_model.py on main). This is a generic config-drift sentinel — any PR that adds TransformerConfig fields must update this golden snapshot — so it belongs in this split, not in a deferred follow-up. Add the five keys with the same default values used in the original PR NVIDIA#3430 / upstream NVIDIA#2943.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 4, 2026
… forward The Codecov patch-coverage gate on NVIDIA#4531 fails (68.25% < 80% target) because two new code paths in this split lack any test: * fused_bias_dropout._get_checkpointed_bda — only reachable via the mHC layer's recompute path, which the unit-test lane doesn't exercise. * transformer_block._build_mhc_recompute_layer_plan, _finalize_mhc_recompute_layer, and the input_expand / output_contract calls in TransformerBlock.forward — only reachable from a full block forward with mHC selective recompute enabled. Neither original test exists in PR NVIDIA#3430: * No test in the upstream PR references get_bias_dropout_add or _get_checkpointed_bda. * The block-level mHC forward test in NVIDIA#3430 (TestPPForwardWithMHC) is multi-GPU-only (torchrun --nproc-per-node=2+) and so wouldn't run in the codecov-feeding unit-test lane even if lifted. Add direct unit coverage: tests/unit_tests/fusions/test_bias_dropout_fusion.py TestBiasDropoutAddMhcRecompute test_checkpointed_bda_forward_backward (parametrized fused/with_bias) test_checkpointed_bda_chained_managers test_get_bda_without_manager_unchanged tests/unit_tests/transformer/test_mhc_block_manager.py TestTransformerBlockMHCRecompute test_recompute_plan_no_layer_num test_recompute_plan_with_layer_num test_recompute_plan_disabled test_block_forward_input_expand_output_contract Combined, these directly hit ~80 of the previously-missed lines in fused_bias_dropout.py (~14 of 19 missed) and transformer_block.py (~12 of 16 missed), which should bring patch coverage above the 80% threshold.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 6, 2026
… manifold hyper connection Adds the core Manifold Hyper Connection (mHC) module and the supporting transformer-block / transformer-layer / config / recompute changes. This is the first split of NVIDIA#3430 (mirror of upstream NVIDIA#2943 + kernel-fusion NVIDIA#3828), covering only files owned by core-adlr / core-nemo / transformer / cuda-graphs reviewers. GPT model wiring, pipeline-parallel support, fused mHC cuTile kernels, and the functional-test recipe are deferred to follow-up split PRs. Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943, basic pytorch impl only — kernel fusion NVIDIA#3828 deferred). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 6, 2026
CI run #25129636936 (job tests/unit_tests/models/**/*.py) failed with "Mamba MoE config drift detected!" because PR NVIDIA#1 adds five new TransformerConfig fields (enable_hyper_connections, mhc_init_gating_factor, mhc_recompute_layer_num, mhc_sinkhorn_iterations, num_residual_streams) but this PR previously deferred the corresponding test_mamba_moe_model.py GOLDEN_CONFIG update (the file has since been renamed to test_hybrid_moe_model.py on main). This is a generic config-drift sentinel — any PR that adds TransformerConfig fields must update this golden snapshot — so it belongs in this split, not in a deferred follow-up. Add the five keys with the same default values used in the original PR NVIDIA#3430 / upstream NVIDIA#2943.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 6, 2026
… forward The Codecov patch-coverage gate on NVIDIA#4531 fails (68.25% < 80% target) because two new code paths in this split lack any test: * fused_bias_dropout._get_checkpointed_bda — only reachable via the mHC layer's recompute path, which the unit-test lane doesn't exercise. * transformer_block._build_mhc_recompute_layer_plan, _finalize_mhc_recompute_layer, and the input_expand / output_contract calls in TransformerBlock.forward — only reachable from a full block forward with mHC selective recompute enabled. Neither original test exists in PR NVIDIA#3430: * No test in the upstream PR references get_bias_dropout_add or _get_checkpointed_bda. * The block-level mHC forward test in NVIDIA#3430 (TestPPForwardWithMHC) is multi-GPU-only (torchrun --nproc-per-node=2+) and so wouldn't run in the codecov-feeding unit-test lane even if lifted. Add direct unit coverage: tests/unit_tests/fusions/test_bias_dropout_fusion.py TestBiasDropoutAddMhcRecompute test_checkpointed_bda_forward_backward (parametrized fused/with_bias) test_checkpointed_bda_chained_managers test_get_bda_without_manager_unchanged tests/unit_tests/transformer/test_mhc_block_manager.py TestTransformerBlockMHCRecompute test_recompute_plan_no_layer_num test_recompute_plan_with_layer_num test_recompute_plan_disabled test_block_forward_input_expand_output_contract Combined, these directly hit ~80 of the previously-missed lines in fused_bias_dropout.py (~14 of 19 missed) and transformer_block.py (~12 of 16 missed), which should bring patch coverage above the 80% threshold.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 7, 2026
… manifold hyper connection Adds the core Manifold Hyper Connection (mHC) module and the supporting transformer-block / transformer-layer / config / recompute changes. This is the first split of NVIDIA#3430 (mirror of upstream NVIDIA#2943 + kernel-fusion NVIDIA#3828), covering only files owned by core-adlr / core-nemo / transformer / cuda-graphs reviewers. GPT model wiring, pipeline-parallel support, fused mHC cuTile kernels, and the functional-test recipe are deferred to follow-up split PRs. Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943, basic pytorch impl only — kernel fusion NVIDIA#3828 deferred). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 7, 2026
CI run #25129636936 (job tests/unit_tests/models/**/*.py) failed with "Mamba MoE config drift detected!" because PR NVIDIA#1 adds five new TransformerConfig fields (enable_hyper_connections, mhc_init_gating_factor, mhc_recompute_layer_num, mhc_sinkhorn_iterations, num_residual_streams) but this PR previously deferred the corresponding test_mamba_moe_model.py GOLDEN_CONFIG update (the file has since been renamed to test_hybrid_moe_model.py on main). This is a generic config-drift sentinel — any PR that adds TransformerConfig fields must update this golden snapshot — so it belongs in this split, not in a deferred follow-up. Add the five keys with the same default values used in the original PR NVIDIA#3430 / upstream NVIDIA#2943.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 7, 2026
… forward The Codecov patch-coverage gate on NVIDIA#4531 fails (68.25% < 80% target) because two new code paths in this split lack any test: * fused_bias_dropout._get_checkpointed_bda — only reachable via the mHC layer's recompute path, which the unit-test lane doesn't exercise. * transformer_block._build_mhc_recompute_layer_plan, _finalize_mhc_recompute_layer, and the input_expand / output_contract calls in TransformerBlock.forward — only reachable from a full block forward with mHC selective recompute enabled. Neither original test exists in PR NVIDIA#3430: * No test in the upstream PR references get_bias_dropout_add or _get_checkpointed_bda. * The block-level mHC forward test in NVIDIA#3430 (TestPPForwardWithMHC) is multi-GPU-only (torchrun --nproc-per-node=2+) and so wouldn't run in the codecov-feeding unit-test lane even if lifted. Add direct unit coverage: tests/unit_tests/fusions/test_bias_dropout_fusion.py TestBiasDropoutAddMhcRecompute test_checkpointed_bda_forward_backward (parametrized fused/with_bias) test_checkpointed_bda_chained_managers test_get_bda_without_manager_unchanged tests/unit_tests/transformer/test_mhc_block_manager.py TestTransformerBlockMHCRecompute test_recompute_plan_no_layer_num test_recompute_plan_with_layer_num test_recompute_plan_disabled test_block_forward_input_expand_output_contract Combined, these directly hit ~80 of the previously-missed lines in fused_bias_dropout.py (~14 of 19 missed) and transformer_block.py (~12 of 16 missed), which should bring patch coverage above the 80% threshold.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 11, 2026
… manifold hyper connection Adds the core Manifold Hyper Connection (mHC) module and the supporting transformer-block / transformer-layer / config / recompute changes. This is the first split of NVIDIA#3430 (mirror of upstream NVIDIA#2943 + kernel-fusion NVIDIA#3828), covering only files owned by core-adlr / core-nemo / transformer / cuda-graphs reviewers. GPT model wiring, pipeline-parallel support, fused mHC cuTile kernels, and the functional-test recipe are deferred to follow-up split PRs. Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943, basic pytorch impl only — kernel fusion NVIDIA#3828 deferred). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 11, 2026
CI run #25129636936 (job tests/unit_tests/models/**/*.py) failed with "Mamba MoE config drift detected!" because PR NVIDIA#1 adds five new TransformerConfig fields (enable_hyper_connections, mhc_init_gating_factor, mhc_recompute_layer_num, mhc_sinkhorn_iterations, num_residual_streams) but this PR previously deferred the corresponding test_mamba_moe_model.py GOLDEN_CONFIG update (the file has since been renamed to test_hybrid_moe_model.py on main). This is a generic config-drift sentinel — any PR that adds TransformerConfig fields must update this golden snapshot — so it belongs in this split, not in a deferred follow-up. Add the five keys with the same default values used in the original PR NVIDIA#3430 / upstream NVIDIA#2943.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 11, 2026
… forward The Codecov patch-coverage gate on NVIDIA#4531 fails (68.25% < 80% target) because two new code paths in this split lack any test: * fused_bias_dropout._get_checkpointed_bda — only reachable via the mHC layer's recompute path, which the unit-test lane doesn't exercise. * transformer_block._build_mhc_recompute_layer_plan, _finalize_mhc_recompute_layer, and the input_expand / output_contract calls in TransformerBlock.forward — only reachable from a full block forward with mHC selective recompute enabled. Neither original test exists in PR NVIDIA#3430: * No test in the upstream PR references get_bias_dropout_add or _get_checkpointed_bda. * The block-level mHC forward test in NVIDIA#3430 (TestPPForwardWithMHC) is multi-GPU-only (torchrun --nproc-per-node=2+) and so wouldn't run in the codecov-feeding unit-test lane even if lifted. Add direct unit coverage: tests/unit_tests/fusions/test_bias_dropout_fusion.py TestBiasDropoutAddMhcRecompute test_checkpointed_bda_forward_backward (parametrized fused/with_bias) test_checkpointed_bda_chained_managers test_get_bda_without_manager_unchanged tests/unit_tests/transformer/test_mhc_block_manager.py TestTransformerBlockMHCRecompute test_recompute_plan_no_layer_num test_recompute_plan_with_layer_num test_recompute_plan_disabled test_block_forward_input_expand_output_contract Combined, these directly hit ~80 of the previously-missed lines in fused_bias_dropout.py (~14 of 19 missed) and transformer_block.py (~12 of 16 missed), which should bring patch coverage above the 80% threshold.
Member
|
Closing this PR in favor of #4531. |
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 14, 2026
… manifold hyper connection Adds the core Manifold Hyper Connection (mHC) module and the supporting transformer-block / transformer-layer / config / recompute changes. This is the first split of NVIDIA#3430 (mirror of upstream NVIDIA#2943 + kernel-fusion NVIDIA#3828), covering only files owned by core-adlr / core-nemo / transformer / cuda-graphs reviewers. GPT model wiring, pipeline-parallel support, fused mHC cuTile kernels, and the functional-test recipe are deferred to follow-up split PRs. Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943, basic pytorch impl only — kernel fusion NVIDIA#3828 deferred). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 14, 2026
CI run #25129636936 (job tests/unit_tests/models/**/*.py) failed with "Mamba MoE config drift detected!" because PR NVIDIA#1 adds five new TransformerConfig fields (enable_hyper_connections, mhc_init_gating_factor, mhc_recompute_layer_num, mhc_sinkhorn_iterations, num_residual_streams) but this PR previously deferred the corresponding test_mamba_moe_model.py GOLDEN_CONFIG update (the file has since been renamed to test_hybrid_moe_model.py on main). This is a generic config-drift sentinel — any PR that adds TransformerConfig fields must update this golden snapshot — so it belongs in this split, not in a deferred follow-up. Add the five keys with the same default values used in the original PR NVIDIA#3430 / upstream NVIDIA#2943.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 14, 2026
… forward The Codecov patch-coverage gate on NVIDIA#4531 fails (68.25% < 80% target) because two new code paths in this split lack any test: * fused_bias_dropout._get_checkpointed_bda — only reachable via the mHC layer's recompute path, which the unit-test lane doesn't exercise. * transformer_block._build_mhc_recompute_layer_plan, _finalize_mhc_recompute_layer, and the input_expand / output_contract calls in TransformerBlock.forward — only reachable from a full block forward with mHC selective recompute enabled. Neither original test exists in PR NVIDIA#3430: * No test in the upstream PR references get_bias_dropout_add or _get_checkpointed_bda. * The block-level mHC forward test in NVIDIA#3430 (TestPPForwardWithMHC) is multi-GPU-only (torchrun --nproc-per-node=2+) and so wouldn't run in the codecov-feeding unit-test lane even if lifted. Add direct unit coverage: tests/unit_tests/fusions/test_bias_dropout_fusion.py TestBiasDropoutAddMhcRecompute test_checkpointed_bda_forward_backward (parametrized fused/with_bias) test_checkpointed_bda_chained_managers test_get_bda_without_manager_unchanged tests/unit_tests/transformer/test_mhc_block_manager.py TestTransformerBlockMHCRecompute test_recompute_plan_no_layer_num test_recompute_plan_with_layer_num test_recompute_plan_disabled test_block_forward_input_expand_output_contract Combined, these directly hit ~80 of the previously-missed lines in fused_bias_dropout.py (~14 of 19 missed) and transformer_block.py (~12 of 16 missed), which should bring patch coverage above the 80% threshold.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 18, 2026
… manifold hyper connection Adds the core Manifold Hyper Connection (mHC) module and the supporting transformer-block / transformer-layer / config / recompute changes. This is the first split of NVIDIA#3430 (mirror of upstream NVIDIA#2943 + kernel-fusion NVIDIA#3828), covering only files owned by core-adlr / core-nemo / transformer / cuda-graphs reviewers. GPT model wiring, pipeline-parallel support, fused mHC cuTile kernels, and the functional-test recipe are deferred to follow-up split PRs. Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943, basic pytorch impl only — kernel fusion NVIDIA#3828 deferred). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 18, 2026
CI run #25129636936 (job tests/unit_tests/models/**/*.py) failed with "Mamba MoE config drift detected!" because PR NVIDIA#1 adds five new TransformerConfig fields (enable_hyper_connections, mhc_init_gating_factor, mhc_recompute_layer_num, mhc_sinkhorn_iterations, num_residual_streams) but this PR previously deferred the corresponding test_mamba_moe_model.py GOLDEN_CONFIG update (the file has since been renamed to test_hybrid_moe_model.py on main). This is a generic config-drift sentinel — any PR that adds TransformerConfig fields must update this golden snapshot — so it belongs in this split, not in a deferred follow-up. Add the five keys with the same default values used in the original PR NVIDIA#3430 / upstream NVIDIA#2943.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
May 18, 2026
… forward The Codecov patch-coverage gate on NVIDIA#4531 fails (68.25% < 80% target) because two new code paths in this split lack any test: * fused_bias_dropout._get_checkpointed_bda — only reachable via the mHC layer's recompute path, which the unit-test lane doesn't exercise. * transformer_block._build_mhc_recompute_layer_plan, _finalize_mhc_recompute_layer, and the input_expand / output_contract calls in TransformerBlock.forward — only reachable from a full block forward with mHC selective recompute enabled. Neither original test exists in PR NVIDIA#3430: * No test in the upstream PR references get_bias_dropout_add or _get_checkpointed_bda. * The block-level mHC forward test in NVIDIA#3430 (TestPPForwardWithMHC) is multi-GPU-only (torchrun --nproc-per-node=2+) and so wouldn't run in the codecov-feeding unit-test lane even if lifted. Add direct unit coverage: tests/unit_tests/fusions/test_bias_dropout_fusion.py TestBiasDropoutAddMhcRecompute test_checkpointed_bda_forward_backward (parametrized fused/with_bias) test_checkpointed_bda_chained_managers test_get_bda_without_manager_unchanged tests/unit_tests/transformer/test_mhc_block_manager.py TestTransformerBlockMHCRecompute test_recompute_plan_no_layer_num test_recompute_plan_with_layer_num test_recompute_plan_disabled test_block_forward_input_expand_output_contract Combined, these directly hit ~80 of the previously-missed lines in fused_bias_dropout.py (~14 of 19 missed) and transformer_block.py (~12 of 16 missed), which should bring patch coverage above the 80% threshold.
This was referenced May 22, 2026
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
Jun 12, 2026
… manifold hyper connection Adds the core Manifold Hyper Connection (mHC) module and the supporting transformer-block / transformer-layer / config / recompute changes. This is the first split of NVIDIA#3430 (mirror of upstream NVIDIA#2943 + kernel-fusion NVIDIA#3828), covering only files owned by core-adlr / core-nemo / transformer / cuda-graphs reviewers. GPT model wiring, pipeline-parallel support, fused mHC cuTile kernels, and the functional-test recipe are deferred to follow-up split PRs. Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943, basic pytorch impl only — kernel fusion NVIDIA#3828 deferred). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
Jun 12, 2026
CI run #25129636936 (job tests/unit_tests/models/**/*.py) failed with "Mamba MoE config drift detected!" because PR NVIDIA#1 adds five new TransformerConfig fields (enable_hyper_connections, mhc_init_gating_factor, mhc_recompute_layer_num, mhc_sinkhorn_iterations, num_residual_streams) but this PR previously deferred the corresponding test_mamba_moe_model.py GOLDEN_CONFIG update (the file has since been renamed to test_hybrid_moe_model.py on main). This is a generic config-drift sentinel — any PR that adds TransformerConfig fields must update this golden snapshot — so it belongs in this split, not in a deferred follow-up. Add the five keys with the same default values used in the original PR NVIDIA#3430 / upstream NVIDIA#2943.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
Jun 12, 2026
… forward The Codecov patch-coverage gate on NVIDIA#4531 fails (68.25% < 80% target) because two new code paths in this split lack any test: * fused_bias_dropout._get_checkpointed_bda — only reachable via the mHC layer's recompute path, which the unit-test lane doesn't exercise. * transformer_block._build_mhc_recompute_layer_plan, _finalize_mhc_recompute_layer, and the input_expand / output_contract calls in TransformerBlock.forward — only reachable from a full block forward with mHC selective recompute enabled. Neither original test exists in PR NVIDIA#3430: * No test in the upstream PR references get_bias_dropout_add or _get_checkpointed_bda. * The block-level mHC forward test in NVIDIA#3430 (TestPPForwardWithMHC) is multi-GPU-only (torchrun --nproc-per-node=2+) and so wouldn't run in the codecov-feeding unit-test lane even if lifted. Add direct unit coverage: tests/unit_tests/fusions/test_bias_dropout_fusion.py TestBiasDropoutAddMhcRecompute test_checkpointed_bda_forward_backward (parametrized fused/with_bias) test_checkpointed_bda_chained_managers test_get_bda_without_manager_unchanged tests/unit_tests/transformer/test_mhc_block_manager.py TestTransformerBlockMHCRecompute test_recompute_plan_no_layer_num test_recompute_plan_with_layer_num test_recompute_plan_disabled test_block_forward_input_expand_output_contract Combined, these directly hit ~80 of the previously-missed lines in fused_bias_dropout.py (~14 of 19 missed) and transformer_block.py (~12 of 16 missed), which should bring patch coverage above the 80% threshold.
Connor-XY
added a commit
to Connor-XY/Megatron-LM
that referenced
this pull request
Jun 12, 2026
Second split of NVIDIA#3430 (mirror of upstream NVIDIA#2943). Wires the mHC core module added in Split 1 (NVIDIA#4531) into the GPT layer specs and the training initialization path: - megatron/core/models/gpt/gpt_layer_specs.py: add enable_hyper_connection kwarg; build mHC-aware ModuleSpec selecting HyperConnectionTransformerLayer - megatron/core/models/gpt/experimental_attention_variant_module_specs.py: forward the kwarg through experimental specs - gpt_builders.py: thread enable_hyper_connection from config to spec - megatron/training/initialize.py: enable NVTX range profiling early - tests/unit_tests/models/test_gpt_layer_specs.py: coverage for the enable_hyper_connection branch - tests/unit_tests/test_fp8_param.py: minor mHC-related test wiring Depends on NVIDIA#4531 — this PR is mergeable only after Split 1 lands. Reviewer groups touched: core-adlr, core-nemo, gpt, training-adlr, training-nemo. Subsequent splits (still pending Split 1's merge): - Split 3: pipeline-parallel support - Split 4: fused mHC cuTile kernels - Split 5: functional-test recipe Original work by @jingqiny-99 in NVIDIA#3430 (upstream NVIDIA#2943). Co-authored-by: jingqiny-99 <jingqiny@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
mirror PR of #2943
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.