docs - Update user manual with new MoE features and Megatron FSDP#2529
Conversation
📝 WalkthroughWalkthroughDocumentation updates introduce Megatron FSDP comprehensive guide and expand MoE optimization content with DeepEP/HybridEP dispatchers. Enhanced checkpoint format documentation clarifies compatibility across different parallelization strategies and FSDP variants. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/parallelisms.md`:
- Around line 303-304: Update the inaccurate inline comment above
apply_flex_dispatcher_backend(model_config,
moe_flex_dispatcher_backend="deepep") to list all supported DeepEP
targets—include B200 and B300 in addition to Ampere/Hopper (e.g., "Apply DeepEP
optimization (Ampere, Hopper, B200, B300)") so the comment matches the GPU
Architecture Requirements section.
In `@docs/training/megatron-fsdp.md`:
- Around line 218-221: The doc line about Torch FSDP2 is ambiguous about
checkpoint formats; update the bullet "- Does not require `fsdp_dtensor`
checkpoint format" to explicitly state that Torch FSDP2 uses the `torch_dist`
checkpoint format (e.g., "- Uses `torch_dist` checkpoint format, not
`fsdp_dtensor`"), and add a short parenthetical or sentence pointing readers to
the relevant checkpointing.md section for details.
- Line 1: Add the new docs/training/megatron-fsdp.md to the table of contents by
editing docs/index.md: locate the "Training and Customization" toctree section
and add an entry for training/megatron-fsdp (use the same relative path style as
other entries in that section). Ensure the new line matches the existing
indentation and ordering convention used in docs/index.md so the file is
included in the generated docs.
- Around line 186-191: The fenced code block showing Python config settings
(dist_config.use_megatron_fsdp and ddp_config.use_megatron_fsdp) is missing a
language tag; update that markdown block to use a Python fence by adding the
language specifier (```python) so the two lines dist_config.use_megatron_fsdp =
True and ddp_config.use_megatron_fsdp = True get proper syntax highlighting and
resolve the MD040 warning.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/parallelisms.mddocs/training/checkpointing.mddocs/training/megatron-fsdp.md
| # Apply DeepEP optimization (for Ampere/Hopper GPUs) | ||
| apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep") |
There was a problem hiding this comment.
Inaccurate GPU comment in complete MoE example.
The inline comment # Apply DeepEP optimization (for Ampere/Hopper GPUs) omits B200 and B300, which are explicitly listed in the GPU Architecture Requirements section at lines 228–230 as supported DeepEP targets.
✏️ Proposed fix
-# Apply DeepEP optimization (for Ampere/Hopper GPUs)
+# Apply DeepEP optimization (for Ampere, Hopper, B200, B300 GPUs)
apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Apply DeepEP optimization (for Ampere/Hopper GPUs) | |
| apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep") | |
| # Apply DeepEP optimization (for Ampere, Hopper, B200, B300 GPUs) | |
| apply_flex_dispatcher_backend(model_config, moe_flex_dispatcher_backend="deepep") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/parallelisms.md` around lines 303 - 304, Update the inaccurate inline
comment above apply_flex_dispatcher_backend(model_config,
moe_flex_dispatcher_backend="deepep") to list all supported DeepEP
targets—include B200 and B300 in addition to Ampere/Hopper (e.g., "Apply DeepEP
optimization (Ampere, Hopper, B200, B300)") so the comment matches the GPU
Architecture Requirements section.
| @@ -0,0 +1,277 @@ | |||
| # Megatron FSDP | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n 'megatron.*fsdp|megatron-fsdp' docs/index.mdRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Check if docs/index.md exists and view its structure
ls -la docs/index.md 2>&1 && echo "---" && cat -n docs/index.mdRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2370
Add docs/training/megatron-fsdp.md to the "Training and Customization" section in docs/index.md.
The new documentation file must be registered in the table of contents. According to coding guidelines, when a new markdown document is added under docs/**/*.md, docs/index.md must be updated with the document in the appropriate section. Add the entry to the "Training and Customization" toctree section.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/training/megatron-fsdp.md` at line 1, Add the new
docs/training/megatron-fsdp.md to the table of contents by editing
docs/index.md: locate the "Training and Customization" toctree section and add
an entry for training/megatron-fsdp (use the same relative path style as other
entries in that section). Ensure the new line matches the existing indentation
and ordering convention used in docs/index.md so the file is included in the
generated docs.
|
|
||
| 1. **Enable FSDP** in both `dist` and `ddp` configurations: | ||
| ```python | ||
| dist_config.use_megatron_fsdp = True | ||
| ddp_config.use_megatron_fsdp = True | ||
| ``` |
There was a problem hiding this comment.
Fenced code block is missing a language specifier.
The code block at the start of the migration steps (lines 188–191) contains Python code but has no language tag, which triggers a markdownlint MD040 warning and degrades syntax highlighting.
✏️ Proposed fix
-```
+```python
dist_config.use_megatron_fsdp = True
ddp_config.use_megatron_fsdp = True🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 186-186: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/training/megatron-fsdp.md` around lines 186 - 191, The fenced code block
showing Python config settings (dist_config.use_megatron_fsdp and
ddp_config.use_megatron_fsdp) is missing a language tag; update that markdown
block to use a Python fence by adding the language specifier (```python) so the
two lines dist_config.use_megatron_fsdp = True and ddp_config.use_megatron_fsdp
= True get proper syntax highlighting and resolve the MD040 warning.
| **Limitations of Torch FSDP2**: | ||
| - Not currently compatible with Pipeline Parallelism | ||
| - Still in experimental stage with potential bugs | ||
| - Does not require `fsdp_dtensor` checkpoint format |
There was a problem hiding this comment.
Torch FSDP2 checkpoint format guidance is incomplete.
The section states Torch FSDP2 "Does not require fsdp_dtensor checkpoint format" but does not tell the reader which format to use instead. Given the compatibility matrix in checkpointing.md shows torch_dist as the supported format for Torch FSDP2, adding that here avoids ambiguity.
✏️ Proposed fix
-- Does not require `fsdp_dtensor` checkpoint format
+- Does not require `fsdp_dtensor` checkpoint format; use the default `torch_dist` format instead📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Limitations of Torch FSDP2**: | |
| - Not currently compatible with Pipeline Parallelism | |
| - Still in experimental stage with potential bugs | |
| - Does not require `fsdp_dtensor` checkpoint format | |
| **Limitations of Torch FSDP2**: | |
| - Not currently compatible with Pipeline Parallelism | |
| - Still in experimental stage with potential bugs | |
| - Does not require `fsdp_dtensor` checkpoint format; use the default `torch_dist` format instead |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/training/megatron-fsdp.md` around lines 218 - 221, The doc line about
Torch FSDP2 is ambiguous about checkpoint formats; update the bullet "- Does not
require `fsdp_dtensor` checkpoint format" to explicitly state that Torch FSDP2
uses the `torch_dist` checkpoint format (e.g., "- Uses `torch_dist` checkpoint
format, not `fsdp_dtensor`"), and add a short parenthetical or sentence pointing
readers to the relevant checkpointing.md section for details.
Signed-off-by: Andrei Onel <onel@users.noreply.github.com>
yaoyu-33
left a comment
There was a problem hiding this comment.
Docs-only change, LGTM.
|
/ok to test 75838d5 |
| **`zarr`** | ||
| - Zarr-based checkpoint format | ||
| - Alternative to `torch_dist` for certain use cases | ||
| - Compatible with distributed parallelism strategies |
There was a problem hiding this comment.
zarr backend was removed in NVIDIA/Megatron-LM#2944
Changes:
Fixes #1722