docs - Update user manual with new MoE features and Megatron FSDP by onel · Pull Request #2529 · NVIDIA-NeMo/Megatron-Bridge

onel · 2026-02-25T18:48:16Z

Changes:

docs/parallelisms.md - Adding DeepEP/HybridEP optimizations, token dropping, and advanced MoE features
docs/training/megatron-fsdp.md - New comprehensive guide for Megatron FSDP
docs/training/checkpointing.md - Updating with fsdp_dtensor checkpoint format information

Fixes #1722

copy-pr-bot · 2026-02-25T18:48:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-25T18:51:43Z

📝 Walkthrough

Walkthrough

Documentation updates introduce Megatron FSDP comprehensive guide and expand MoE optimization content with DeepEP/HybridEP dispatchers. Enhanced checkpoint format documentation clarifies compatibility across different parallelization strategies and FSDP variants.

Changes

Cohort / File(s)	Summary
MoE Optimization Features `docs/parallelisms.md`	Renamed and expanded "DeepEP Optimization" section to "DeepEP and HybridEP Optimizations". Introduces two high-performance MoE token dispatchers (DeepEP and HybridEP) with architecture-specific availability. Adds configuration examples using apply_flex_dispatcher_backend with values "deepep" and "hybridep". Includes new GPTModelProvider parameters (moe_expert_capacity_factor, moe_router_topk, moe_token_dispatcher_type, moe_router_load_balancing_type, moe_ffn_hidden_size). Documents Token Dropping for Load Balancing with capacity-factor semantics and related requirements.
Checkpoint Format Documentation `docs/training/checkpointing.md`	Expands checkpoint format coverage with torch_dist, zarr, and fsdp_dtensor formats. Introduces "Available Formats" section detailing characteristics and applicability. Adds "Format Selection" code example for DDP/TP/PP and Megatron FSDP scenarios. Includes "Format Compatibility" matrix across DDP, Distributed Optimizer, Megatron FSDP, Torch FSDP2, and Async Save. Clarifies fsdp_dtensor requirement for Megatron FSDP. Adds "Performance Optimizations" section and "Related Documentation" subsection.
Megatron FSDP Guide `docs/training/megatron-fsdp.md`	New comprehensive documentation file covering Megatron FSDP concepts, configuration, compatibility, automatic adjustments, complete configuration examples, migration from DDP, Torch FSDP2 alternatives, performance considerations, troubleshooting, and references.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

docs-only

Suggested reviewers

chenopis
ko3n1g
ananthsub

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	All PR objectives align with issue `#1722`: documentation updates for MoE features (DeepEP, HybridEP, token dropping) and Megatron FSDP are present in parallelisms.md, checkpointing.md, and the new megatron-fsdp.md file.
Out of Scope Changes check	✅ Passed	All three modified files contain documentation updates directly related to the linked issue requirements: MoE features and Megatron FSDP documentation, with no unrelated changes detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	This PR contains only documentation-only changes updating markdown files in docs/ directory with no source code modifications, test results are not required.
Title check	✅ Passed	The title accurately summarizes the main changes: documentation updates covering both new MoE features (DeepEP, HybridEP, token dropping) and Megatron FSDP configuration, which are the primary focus across all three modified documentation files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/parallelisms.md`:
- Around line 303-304: Update the inaccurate inline comment above
apply_flex_dispatcher_backend(model_config,
moe_flex_dispatcher_backend="deepep") to list all supported DeepEP
targets—include B200 and B300 in addition to Ampere/Hopper (e.g., "Apply DeepEP
optimization (Ampere, Hopper, B200, B300)") so the comment matches the GPU
Architecture Requirements section.

In `@docs/training/megatron-fsdp.md`:
- Around line 218-221: The doc line about Torch FSDP2 is ambiguous about
checkpoint formats; update the bullet "- Does not require `fsdp_dtensor`
checkpoint format" to explicitly state that Torch FSDP2 uses the `torch_dist`
checkpoint format (e.g., "- Uses `torch_dist` checkpoint format, not
`fsdp_dtensor`"), and add a short parenthetical or sentence pointing readers to
the relevant checkpointing.md section for details.
- Line 1: Add the new docs/training/megatron-fsdp.md to the table of contents by
editing docs/index.md: locate the "Training and Customization" toctree section
and add an entry for training/megatron-fsdp (use the same relative path style as
other entries in that section). Ensure the new line matches the existing
indentation and ordering convention used in docs/index.md so the file is
included in the generated docs.
- Around line 186-191: The fenced code block showing Python config settings
(dist_config.use_megatron_fsdp and ddp_config.use_megatron_fsdp) is missing a
language tag; update that markdown block to use a Python fence by adding the
language specifier (```python) so the two lines dist_config.use_megatron_fsdp =
True and ddp_config.use_megatron_fsdp = True get proper syntax highlighting and
resolve the MD040 warning.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b058b66 and 172db21.

📒 Files selected for processing (3)

docs/parallelisms.md
docs/training/checkpointing.md
docs/training/megatron-fsdp.md

coderabbitai · 2026-02-25T18:51:46Z

+# Apply DeepEP optimization (for Ampere/Hopper GPUs)
+apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep")


⚠️ Potential issue | 🟡 Minor

Inaccurate GPU comment in complete MoE example.

The inline comment # Apply DeepEP optimization (for Ampere/Hopper GPUs) omits B200 and B300, which are explicitly listed in the GPU Architecture Requirements section at lines 228–230 as supported DeepEP targets.

✏️ Proposed fix

-# Apply DeepEP optimization (for Ampere/Hopper GPUs) +# Apply DeepEP optimization (for Ampere, Hopper, B200, B300 GPUs) apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Apply DeepEP optimization (for Ampere/Hopper GPUs)

apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep")

# Apply DeepEP optimization (for Ampere, Hopper, B200, B300 GPUs)

apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/parallelisms.md` around lines 303 - 304, Update the inaccurate inline comment above apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep") to list all supported DeepEP targets—include B200 and B300 in addition to Ampere/Hopper (e.g., "Apply DeepEP optimization (Ampere, Hopper, B200, B300)") so the comment matches the GPU Architecture Requirements section.

coderabbitai · 2026-02-25T18:51:46Z

@@ -0,0 +1,277 @@
+# Megatron FSDP


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n 'megatron.*fsdp|megatron-fsdp' docs/index.md

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 53

🏁 Script executed:

# Check if docs/index.md exists and view its structure ls -la docs/index.md 2>&1 && echo "---" && cat -n docs/index.md

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2370

Add docs/training/megatron-fsdp.md to the "Training and Customization" section in docs/index.md.

The new documentation file must be registered in the table of contents. According to coding guidelines, when a new markdown document is added under docs/**/*.md, docs/index.md must be updated with the document in the appropriate section. Add the entry to the "Training and Customization" toctree section.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/training/megatron-fsdp.md` at line 1, Add the new docs/training/megatron-fsdp.md to the table of contents by editing docs/index.md: locate the "Training and Customization" toctree section and add an entry for training/megatron-fsdp (use the same relative path style as other entries in that section). Ensure the new line matches the existing indentation and ordering convention used in docs/index.md so the file is included in the generated docs.

coderabbitai · 2026-02-25T18:51:46Z

+
+1. **Enable FSDP** in both `dist` and `ddp` configurations:
+   ```python
+   dist_config.use_megatron_fsdp = True
+   ddp_config.use_megatron_fsdp = True
+   ```


⚠️ Potential issue | 🟡 Minor

Fenced code block is missing a language specifier.

The code block at the start of the migration steps (lines 188–191) contains Python code but has no language tag, which triggers a markdownlint MD040 warning and degrades syntax highlighting.

✏️ Proposed fix

-``` +```python dist_config.use_megatron_fsdp = True ddp_config.use_megatron_fsdp = True

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 186-186: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/training/megatron-fsdp.md` around lines 186 - 191, The fenced code block showing Python config settings (dist_config.use_megatron_fsdp and ddp_config.use_megatron_fsdp) is missing a language tag; update that markdown block to use a Python fence by adding the language specifier (```python) so the two lines dist_config.use_megatron_fsdp = True and ddp_config.use_megatron_fsdp = True get proper syntax highlighting and resolve the MD040 warning.

coderabbitai · 2026-02-25T18:51:46Z

+**Limitations of Torch FSDP2**:
+- Not currently compatible with Pipeline Parallelism
+- Still in experimental stage with potential bugs
+- Does not require `fsdp_dtensor` checkpoint format


⚠️ Potential issue | 🟡 Minor

Torch FSDP2 checkpoint format guidance is incomplete.

The section states Torch FSDP2 "Does not require fsdp_dtensor checkpoint format" but does not tell the reader which format to use instead. Given the compatibility matrix in checkpointing.md shows torch_dist as the supported format for Torch FSDP2, adding that here avoids ambiguity.

✏️ Proposed fix

-- Does not require `fsdp_dtensor` checkpoint format +- Does not require `fsdp_dtensor` checkpoint format; use the default `torch_dist` format instead

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

**Limitations of Torch FSDP2**:

- Not currently compatible with Pipeline Parallelism

- Still in experimental stage with potential bugs

- Does not require `fsdp_dtensor` checkpoint format

**Limitations of Torch FSDP2**:

- Not currently compatible with Pipeline Parallelism

- Still in experimental stage with potential bugs

- Does not require `fsdp_dtensor` checkpoint format; use the default `torch_dist` format instead

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/training/megatron-fsdp.md` around lines 218 - 221, The doc line about Torch FSDP2 is ambiguous about checkpoint formats; update the bullet "- Does not require `fsdp_dtensor` checkpoint format" to explicitly state that Torch FSDP2 uses the `torch_dist` checkpoint format (e.g., "- Uses `torch_dist` checkpoint format, not `fsdp_dtensor`"), and add a short parenthetical or sentence pointing readers to the relevant checkpointing.md section for details.

Signed-off-by: Andrei Onel <onel@users.noreply.github.com>

yaoyu-33

Docs-only change, LGTM.

yaoyu-33 · 2026-03-02T21:28:31Z

/ok to test 75838d5

asolergi-nv · 2026-03-13T09:39:21Z

+**`zarr`**
+- Zarr-based checkpoint format
+- Alternative to `torch_dist` for certain use cases
+- Compatible with distributed parallelism strategies


zarr backend was removed in NVIDIA/Megatron-LM#2944

) Signed-off-by: Andrei Onel <onel@users.noreply.github.com> Co-authored-by: askmanu[bot] <192355599+askmanu[bot]@users.noreply.github.com>

askmanu Bot added 3 commits February 25, 2026 17:54

Update MoE section with DeepEP, HybridEP, and token dropping features

cc829ed

Add Megatron FSDP documentation

834ed16

Update checkpointing docs with fsdp_dtensor format information

172db21

github-actions Bot added the community-request label Feb 25, 2026

onel changed the title ~~[Documentation] Update user manual with new MoE features and Megatron FSDP~~ docs - Update user manual with new MoE features and Megatron FSDP Feb 25, 2026

coderabbitai Bot reviewed Feb 25, 2026

View reviewed changes

askmanu Bot and others added 2 commits February 25, 2026 18:58

Add megatron-fsdp.md to Training and Customization toctree

9938524

Update index.md with new table of contents

75838d5

Signed-off-by: Andrei Onel <onel@users.noreply.github.com>

yaoyu-33 added the docs-only With great power comes great responsibility. label Mar 2, 2026

yaoyu-33 approved these changes Mar 2, 2026

View reviewed changes

yaoyu-33 merged commit af416ec into NVIDIA-NeMo:main Mar 12, 2026
24 checks passed

asolergi-nv reviewed Mar 13, 2026

View reviewed changes

cuichenx mentioned this pull request May 8, 2026

[NeMo FW 26.06 Release] MBridge v0.5.0 Roadmap #3754

Open

		# Apply DeepEP optimization (for Ampere/Hopper GPUs)
		apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep")

Conversation

onel commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Feb 25, 2026

Uh oh!

coderabbitai Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 left a comment

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented Mar 2, 2026

Uh oh!

Uh oh!

asolergi-nv Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

onel commented Feb 25, 2026 •

edited

Loading

coderabbitai Bot commented Feb 25, 2026 •

edited

Loading