Skip to content

feat: Support Nemotron-nano-v3 Omni AutoModel Path #2362

Open
yuekaizhang wants to merge 64 commits into
NVIDIA-NeMo:mainfrom
yuekaizhang:nemotron
Open

feat: Support Nemotron-nano-v3 Omni AutoModel Path #2362
yuekaizhang wants to merge 64 commits into
NVIDIA-NeMo:mainfrom
yuekaizhang:nemotron

Conversation

@yuekaizhang

Copy link
Copy Markdown
Contributor

This PR follows the nano-v3-omni mbridge training branch to add AutoModel backend support for Nemotron-Nano-Omni.

yuekaizhang and others added 28 commits April 29, 2026 02:55
Signed-off-by: Yuekai Zhang <yuekaiz@cw-dfw-cs-001-vscode-02.cm.cluster>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Add .claude/settings.local.json, .codex, and .humanize/ to
.gitignore as these are local tool configuration/cache files
that should not be tracked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Add MMPRTinyDataset class with HF download fallback and local cache support
- Add format_mmpr_tiny_dataset for OpenAI-API message conversion
- Port verl_geo3k reward function from old Megatron-Bridge implementation
- Register mmpr-tiny in DATASET_REGISTRY and vlm_hf_data_processor
- Register verl_geo3k reward in VLMVerifyWorker
- Add mathruler dependency to pyproject.toml for answer grading
- Create debug-friendly YAML config with step_400 checkpoint
- Create launch script with uv sync pre-step
- Create CoT prompt file with \boxed{} instruction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Remove self.preprocessor from MMPRTinyDataset to prevent double
  formatting (vlm_hf_data_processor already handles format dispatch)
- Add pylatexenc to pyproject.toml as transitive dependency of mathruler
  (mathruler does not declare it in its own metadata)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Fix prompt file brace escaping: \boxed{} -> \boxed{{}} so
  str.format() has exactly one replacement placeholder (fixes AC-2/AC-5)
- Add explicit split parameter validation to MMPRTinyDataset with
  ValueError for unsupported splits (fixes AC-1 negative test)
- Regenerate uv.lock with mathruler and pylatexenc entries (fixes AC-6)
- Add unit tests for dataset formatting, split validation, prompt
  file format compatibility, and verl_geo3k_reward (14 tests, all pass)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Add 3 processor smoke tests using a stub NemotronNanoVLV2Processor:
- test_processor_produces_valid_datum_spec: verifies DatumSpec fields
- test_prompted_text_contains_boxed_literal: verifies \boxed{} survives
- test_placeholder_conversion_for_nemotron_processor: verifies <image>
  placeholder and question text in vllm_content

Uses a tiny 1x1 PNG fixture for image resolution. All 17 MMPR tests
pass (11 dataset + 6 reward).

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Stub now captures the exact text arg passed to __call__
- Assert exact equality: vllm_content == "<image>\n" + prompted_question
- Assert exactly one <image> token in output (no duplicates)
- Negative assertion: raw dataset string "<image>\nQuestion" not in output
- Assert captured __call__ text matches expected tokenizer input
- All tests run with -p no:testmon --override-ini='addopts='

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Clean up partial state (stale images_dir, parquet, temp dir) before
re-downloading when the ready marker is absent. Prevents shutil.move
from nesting images/images when images_dir already exists from an
interrupted prior attempt.

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
MMPR-Tiny has 4,192 rows with multiple images (up to 11). The previous
code truncated to images[0], losing visual context for multi-image
questions.

- _load_mmpr_tiny_from_cache: keep all image paths instead of [imgs[0]]
- format_mmpr_tiny_dataset: split question on <image> and <image_N>
  placeholders, interleave image content items with text segments
- Add tests for multi-image and numbered-placeholder formatting

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
…nting

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
…sound_projection to resume training

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
@yuekaizhang yuekaizhang requested review from a team as code owners April 29, 2026 10:25
@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 52a8808 (PR #2362 from nemotron)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Resolved conflicts:
- pyproject.toml: kept transformers==5.5.0 (gemma support) over origin/main's
  5.3.0, force-overriding past vLLM 0.20.0's !=5.5.0 constraint; took
  origin/main's mlflow>=3.12.0.
- tests/unit/test_recipes_and_test_suites.py: kept nightly GPU-hours ceiling
  at 1410 (merged suite dry-runs to 1409 GPU hours; 1360 would fail).
- uv.lock: regenerated from merged pyproject.toml (transformers 5.3.0 -> 5.5.0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yuekaizhang

Copy link
Copy Markdown
Contributor Author

/ok to test dfaeb37

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: dfaeb37 (PR #2362 from nemotron)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

…lues

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: ef24ee2 (PR #2362 from nemotron)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@yuekaizhang

Copy link
Copy Markdown
Contributor Author

/ok to test ef24ee2

yuekaizhang and others added 2 commits June 1, 2026 22:06
This reverts commit 5de2521.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
@yuekaizhang

Copy link
Copy Markdown
Contributor Author

/ok to test 96c3ffe

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 0fa4b2d (PR #2362 from nemotron)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 96c3ffe (PR #2362 from nemotron)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants