Create model-support.md for NeMoRL#1705
Conversation
Added documentation for model support and acceleration recipes. Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com>
📝 WalkthroughWalkthroughAdded documentation file describing Hugging Face model support in NeMo, covering LLMs and VLMs, supported model sizes, acceleration optimization via NeMo Megatron-bridge, and lists of compatible models. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Pre-merge checks✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/about/model-support.md
🧰 Additional context used
📓 Path-based instructions (2)
docs/**/*.md
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section
Files:
docs/about/model-support.md
!(**/tests/**|**/test_*.py|**/test_*.sh)
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year
Files:
docs/about/model-support.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-container / main
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (1)
docs/about/model-support.md (1)
1-28: Verify that docs/index.md has been updated to reference this new documentation file.Per the coding guidelines, when a new markdown doc is added under
docs/**/*.md, thedocs/index.mdfile must be updated to include a reference in the most appropriate section. Please confirm thatdocs/index.mdhas been updated with an entry for this model-support.md documentation.
|
|
||
| ## Broad coverage for 🤗Hugging Face models via [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | ||
|
|
||
| NeMo-RL support 🤗Hugging Face models from the following classes |
There was a problem hiding this comment.
Fix grammatical error: "support" should be "supports".
Line 5 is missing the verb conjugation.
-NeMo-RL support 🤗Hugging Face models from the following classes
+NeMo-RL supports 🤗Hugging Face models from the following classes📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| NeMo-RL support 🤗Hugging Face models from the following classes | |
| NeMo-RL supports 🤗Hugging Face models from the following classes |
🤖 Prompt for AI Agents
In docs/about/model-support.md around line 5, the sentence "NeMo-RL support
🤗Hugging Face models from the following classes" uses incorrect verb
conjugation; change "support" to "supports" so the sentence reads "NeMo-RL
supports 🤗Hugging Face models from the following classes."
|
closing in favor of #1799 which puts the changes all together |
- train.py: remove the obsolete use_cache/activation-checkpointing incompatibility note. Automodel NVIDIA-NeMo#1705 (pinned 6de0c361) keeps use_cache=True for KV-sharing models under activation checkpointing, so the E4B VLM recipe's activation_checkpointing: true is safe. - dtensor_policy_worker.py (v1): remove the Gemma4 mm_token_type_ids injection. The v1 DTensor worker is being deprecated; all shipped Gemma4 recipes use _v2: true, which threads use_cache/mm_token_type_ids correctly. - setup.py: drop the Nemotron-H projection-dtype patch. A module forward-hook cannot reach the fused Mamba kernel's internal out_proj F.linear, so it cannot make nemotron-h LoRA train; the proper fix is the Automodel r0.5.0 restore-dtype change (tracked as a separate migration). - recipes: migrate enable_deepep: true -> experts: gmm + dispatcher: deepep for the gemma4/qwen3.5 automodel recipes (enable_deepep is deprecated in Automodel BackendConfig; behavior-preserving). Verified: 26B-A4B trains 20 steps, gen_kl 0.0009, gates pass. - tests: harden the E4B VLM gate with median(token_mult_prob_error) < 1.05 (observed 1.011 in CI); add a reward-ordering invariant to the reward-model env test; add hermetic unit tests for _needs_kv_cache_for_shared_layers and the Gemma4 mm_token_type_ids injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- train.py: remove the obsolete use_cache/activation-checkpointing incompatibility note. Automodel NVIDIA-NeMo#1705 (pinned 6de0c361) keeps use_cache=True for KV-sharing models under activation checkpointing, so the E4B VLM recipe's activation_checkpointing: true is safe. - dtensor_policy_worker.py (v1): remove the Gemma4 mm_token_type_ids injection. The v1 DTensor worker is being deprecated; all shipped Gemma4 recipes use _v2: true, which threads use_cache/mm_token_type_ids correctly. - setup.py: drop the Nemotron-H projection-dtype patch. A module forward-hook cannot reach the fused Mamba kernel's internal out_proj F.linear, so it cannot make nemotron-h LoRA train; the proper fix is the Automodel r0.5.0 restore-dtype change (tracked as a separate migration). - recipes: migrate enable_deepep: true -> experts: gmm + dispatcher: deepep for the gemma4/qwen3.5 automodel recipes (enable_deepep is deprecated in Automodel BackendConfig; behavior-preserving). Verified: 26B-A4B trains 20 steps, gen_kl 0.0009, gates pass. - tests: harden the E4B VLM gate with median(token_mult_prob_error) < 1.05 (observed 1.011 in CI); add a reward-ordering invariant to the reward-model env test; add hermetic unit tests for _needs_kv_cache_for_shared_layers and the Gemma4 mm_token_type_ids injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- train.py: remove the obsolete use_cache/activation-checkpointing incompatibility note. Automodel NVIDIA-NeMo#1705 (pinned 6de0c361) keeps use_cache=True for KV-sharing models under activation checkpointing, so the E4B VLM recipe's activation_checkpointing: true is safe. - dtensor_policy_worker.py (v1): remove the Gemma4 mm_token_type_ids injection. The v1 DTensor worker is being deprecated; all shipped Gemma4 recipes use _v2: true, which threads use_cache/mm_token_type_ids correctly. - setup.py: drop the Nemotron-H projection-dtype patch. A module forward-hook cannot reach the fused Mamba kernel's internal out_proj F.linear, so it cannot make nemotron-h LoRA train; the proper fix is the Automodel r0.5.0 restore-dtype change (tracked as a separate migration). - recipes: migrate enable_deepep: true -> experts: gmm + dispatcher: deepep for the gemma4/qwen3.5 automodel recipes (enable_deepep is deprecated in Automodel BackendConfig; behavior-preserving). Verified: 26B-A4B trains 20 steps, gen_kl 0.0009, gates pass. - tests: harden the E4B VLM gate with median(token_mult_prob_error) < 1.05 (observed 1.011 in CI); add a reward-ordering invariant to the reward-model env test; add hermetic unit tests for _needs_kv_cache_for_shared_layers and the Gemma4 mm_token_type_ids injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- train.py: remove the obsolete use_cache/activation-checkpointing incompatibility note. Automodel NVIDIA-NeMo#1705 (pinned 6de0c361) keeps use_cache=True for KV-sharing models under activation checkpointing, so the E4B VLM recipe's activation_checkpointing: true is safe. - dtensor_policy_worker.py (v1): remove the Gemma4 mm_token_type_ids injection. The v1 DTensor worker is being deprecated; all shipped Gemma4 recipes use _v2: true, which threads use_cache/mm_token_type_ids correctly. - setup.py: drop the Nemotron-H projection-dtype patch. A module forward-hook cannot reach the fused Mamba kernel's internal out_proj F.linear, so it cannot make nemotron-h LoRA train; the proper fix is the Automodel r0.5.0 restore-dtype change (tracked as a separate migration). - recipes: migrate enable_deepep: true -> experts: gmm + dispatcher: deepep for the gemma4/qwen3.5 automodel recipes (enable_deepep is deprecated in Automodel BackendConfig; behavior-preserving). Verified: 26B-A4B trains 20 steps, gen_kl 0.0009, gates pass. - tests: harden the E4B VLM gate with median(token_mult_prob_error) < 1.05 (observed 1.011 in CI); add a reward-ordering invariant to the reward-model env test; add hermetic unit tests for _needs_kv_cache_for_shared_layers and the Gemma4 mm_token_type_ids injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- train.py: remove the obsolete use_cache/activation-checkpointing incompatibility note. Automodel NVIDIA-NeMo#1705 (pinned 6de0c361) keeps use_cache=True for KV-sharing models under activation checkpointing, so the E4B VLM recipe's activation_checkpointing: true is safe. - dtensor_policy_worker.py (v1): remove the Gemma4 mm_token_type_ids injection. The v1 DTensor worker is being deprecated; all shipped Gemma4 recipes use _v2: true, which threads use_cache/mm_token_type_ids correctly. - setup.py: drop the Nemotron-H projection-dtype patch. A module forward-hook cannot reach the fused Mamba kernel's internal out_proj F.linear, so it cannot make nemotron-h LoRA train; the proper fix is the Automodel r0.5.0 restore-dtype change (tracked as a separate migration). - recipes: migrate enable_deepep: true -> experts: gmm + dispatcher: deepep for the gemma4/qwen3.5 automodel recipes (enable_deepep is deprecated in Automodel BackendConfig; behavior-preserving). Verified: 26B-A4B trains 20 steps, gen_kl 0.0009, gates pass. - tests: harden the E4B VLM gate with median(token_mult_prob_error) < 1.05 (observed 1.011 in CI); add a reward-ordering invariant to the reward-model env test; add hermetic unit tests for _needs_kv_cache_for_shared_layers and the Gemma4 mm_token_type_ids injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- train.py: remove the obsolete use_cache/activation-checkpointing incompatibility note. Automodel NVIDIA-NeMo#1705 (pinned 6de0c361) keeps use_cache=True for KV-sharing models under activation checkpointing, so the E4B VLM recipe's activation_checkpointing: true is safe. - dtensor_policy_worker.py (v1): remove the Gemma4 mm_token_type_ids injection. The v1 DTensor worker is being deprecated; all shipped Gemma4 recipes use _v2: true, which threads use_cache/mm_token_type_ids correctly. - setup.py: drop the Nemotron-H projection-dtype patch. A module forward-hook cannot reach the fused Mamba kernel's internal out_proj F.linear, so it cannot make nemotron-h LoRA train; the proper fix is the Automodel r0.5.0 restore-dtype change (tracked as a separate migration). - recipes: migrate enable_deepep: true -> experts: gmm + dispatcher: deepep for the gemma4/qwen3.5 automodel recipes (enable_deepep is deprecated in Automodel BackendConfig; behavior-preserving). Verified: 26B-A4B trains 20 steps, gen_kl 0.0009, gates pass. - tests: harden the E4B VLM gate with median(token_mult_prob_error) < 1.05 (observed 1.011 in CI); add a reward-ordering invariant to the reward-model env test; add hermetic unit tests for _needs_kv_cache_for_shared_layers and the Gemma4 mm_token_type_ids injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- train.py: remove the obsolete use_cache/activation-checkpointing incompatibility note. Automodel NVIDIA-NeMo#1705 (pinned 6de0c361) keeps use_cache=True for KV-sharing models under activation checkpointing, so the E4B VLM recipe's activation_checkpointing: true is safe. - dtensor_policy_worker.py (v1): remove the Gemma4 mm_token_type_ids injection. The v1 DTensor worker is being deprecated; all shipped Gemma4 recipes use _v2: true, which threads use_cache/mm_token_type_ids correctly. - setup.py: drop the Nemotron-H projection-dtype patch. A module forward-hook cannot reach the fused Mamba kernel's internal out_proj F.linear, so it cannot make nemotron-h LoRA train; the proper fix is the Automodel r0.5.0 restore-dtype change (tracked as a separate migration). - recipes: migrate enable_deepep: true -> experts: gmm + dispatcher: deepep for the gemma4/qwen3.5 automodel recipes (enable_deepep is deprecated in Automodel BackendConfig; behavior-preserving). Verified: 26B-A4B trains 20 steps, gen_kl 0.0009, gates pass. - tests: harden the E4B VLM gate with median(token_mult_prob_error) < 1.05 (observed 1.011 in CI); add a reward-ordering invariant to the reward-model env test; add hermetic unit tests for _needs_kv_cache_for_shared_layers and the Gemma4 mm_token_type_ids injection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Added documentation for model support and acceleration recipes.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.