[model, recipe, examples] feat: add Nemotron-3 Nano Omni support#3760
Conversation
Adds end-to-end support for Nemotron-3 Nano Omni (30B-A3B MoE multimodal: MoE Mamba/attention hybrid LM + RADIO vision tower + Parakeet sound encoder), targeting HF architecture NemotronH_Nano_Omni_Reasoning_V3: - Bridge + provider + sound encoder under src/megatron/bridge/models/nemotron_omni/ - Recipe (CORD-V2 SFT/PEFT, VALOR32K-AVQA SFT/PEFT) under src/megatron/bridge/recipes/nemotron_omni/ - Forward step under src/megatron/bridge/training/nemotron_omni_step.py - Energon task encoder for chat-ML samples with raw-waveform/mel audio - VLM dataset glue: nemotron_omni_collate_fn, valor32k_avqa maker, audiohandler decoder, packing toggle on EnergonProvider - Examples under examples/models/vlm/nemotron_3_omni/: README, conversion script, single- and multi-modality inference, slurm SFT/LoRA scripts, data-prep scripts, evaluation scripts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Chen Cui <chcui@nvidia.com>
… for video inference Two small clarifications in the Nemotron-3 Nano Omni example README, based on a fresh end-to-end verification run: - Checkpoint Conversion → Export: call out that --trust-remote-code is required for the export step, not just import. The exporter loads the HF config, which references the custom modeling module shipped with NemotronH_Nano_Omni_Reasoning_V3. - Inference: add a callout that the video modes (rows 2 and 4) need `decord` installed, since it is not pulled in by any pyproject extra. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Chen Cui <chcui@nvidia.com>
Light Code ReviewCritical
Minor
Missing test coverage This PR adds a new model family (bridge, provider, recipe, task encoder, collate, forward step) with no unit or functional tests. Per the adding-model-support guidelines, the following are expected:
Suggested test cases No perf tests impacted. |
Code Review — Nemotron-3 Nano OmniCritical: Debug code left in production
import os as _os
if _os.environ.get("NOMNI_DEBUG_TILES") == "1":
print(f"[DEBUG step] num_image_tiles=...")This is an inline Bare
|
| Test | Type | What it covers |
|---|---|---|
test_nemotron_omni_bridge.py::TestProviderBridge::test_provider_type |
unit | provider_bridge() returns NemotronOmniModelProvider |
test_nemotron_omni_bridge.py::TestProviderBridge::test_moe_fields |
unit | MoE config fields (num_moe_experts, moe_router_topk, shared expert) mapped correctly |
test_nemotron_omni_bridge.py::TestProviderBridge::test_sound_fields |
unit | Sound encoder config (sound_config, freeze_sound_model) propagated to provider |
test_nemotron_omni_bridge.py::TestProviderBridge::test_tie_word_embeddings_from_top_level |
unit | share_embeddings_and_output_weights read from top-level HF config, not text_config |
test_nemotron_omni_bridge.py::TestMappingRegistry::test_has_sound_encoder_mappings |
unit | mapping_registry() includes sound encoder ReplicatedMapping entries |
test_nemotron_omni_bridge.py::TestMappingRegistry::test_has_temporal_embedder_mappings |
unit | Temporal video embedder weights are mapped |
test_nemotron_omni_provider.py::TestFreeze::test_freeze_sound_model |
unit | freeze(freeze_sound_model=True) sets requires_grad=False on sound params |
test_nemotron_omni_conversion.py::TestRoundtrip::test_tp1_pp1 |
functional (GPU) | Toy model roundtrip HF→Megatron→HF at TP=1,PP=1 |
| No perf tests impacted | — | No performance configs were added or modified |
🤖 Generated with Claude Code
|
/ok to test 77739ca |
|
/ok to test dcae350 |
Signed-off-by: Chen Cui <chcui@nvidia.com>
|
/ok to test 4dfe1ad |
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com> # Conflicts: # examples/models/nemotron/nemotron_3_omni/README.md # examples/models/qwen/nemotron_3_omni/conversion.sh # examples/models/qwen/nemotron_3_omni/cord_v2_inference.py # examples/models/qwen/nemotron_3_omni/hf_to_megatron_generate_nemotron_omni.py # examples/models/qwen/nemotron_3_omni/inference.sh # examples/models/qwen/nemotron_3_omni/slurm_peft_cord_v2.sh # examples/models/qwen/nemotron_3_omni/slurm_peft_valor32k_avqa.sh # examples/models/qwen/nemotron_3_omni/slurm_sft_cord_v2.sh # examples/models/qwen/nemotron_3_omni/slurm_sft_valor32k_avqa.sh # examples/models/qwen/nemotron_3_omni/valor32k_avqa_inference.py
|
/ok to test 060d951 |
Summary
NemotronH_Nano_Omni_Reasoning_V3.src/megatron/bridge/models/nemotron_omni/, recipe undersrc/megatron/bridge/recipes/nemotron_omni/, forward step atsrc/megatron/bridge/training/nemotron_omni_step.py, Energon task encoder for chat-ML samples with raw-waveform / mel audio, and supporting glue (collate fn, valor32k_avqa maker, audiohandler decoder, packing toggle on EnergonProvider).examples/models/vlm/nemotron_3_omni/directory with conversion script, single- / multi-modality inference, slurm SFT/LoRA scripts, data-prep scripts, and evaluation scripts.Test plan
Locally verified end-to-end against
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16on an 8 × H100 80GB node:--not-strict, 4 expected-missing tensors regenerated from config) — ✅decord)Notes:
[ssm]and[audio]extras (mamba-ssm,causal-conv1d,librosa) are required at install time;decordis needed for video sampling.freeze_language_model=Truefor single-node runs.🤖 Generated with Claude Code