feat: Support Nemotron-nano-v3 Omni AutoModel Path #2362
Open
yuekaizhang wants to merge 64 commits into
Open
Conversation
Signed-off-by: Yuekai Zhang <yuekaiz@cw-dfw-cs-001-vscode-02.cm.cluster> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Add .claude/settings.local.json, .codex, and .humanize/ to .gitignore as these are local tool configuration/cache files that should not be tracked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Add MMPRTinyDataset class with HF download fallback and local cache support
- Add format_mmpr_tiny_dataset for OpenAI-API message conversion
- Port verl_geo3k reward function from old Megatron-Bridge implementation
- Register mmpr-tiny in DATASET_REGISTRY and vlm_hf_data_processor
- Register verl_geo3k reward in VLMVerifyWorker
- Add mathruler dependency to pyproject.toml for answer grading
- Create debug-friendly YAML config with step_400 checkpoint
- Create launch script with uv sync pre-step
- Create CoT prompt file with \boxed{} instruction
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Remove self.preprocessor from MMPRTinyDataset to prevent double formatting (vlm_hf_data_processor already handles format dispatch) - Add pylatexenc to pyproject.toml as transitive dependency of mathruler (mathruler does not declare it in its own metadata) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Fix prompt file brace escaping: \boxed{} -> \boxed{{}} so
str.format() has exactly one replacement placeholder (fixes AC-2/AC-5)
- Add explicit split parameter validation to MMPRTinyDataset with
ValueError for unsupported splits (fixes AC-1 negative test)
- Regenerate uv.lock with mathruler and pylatexenc entries (fixes AC-6)
- Add unit tests for dataset formatting, split validation, prompt
file format compatibility, and verl_geo3k_reward (14 tests, all pass)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Add 3 processor smoke tests using a stub NemotronNanoVLV2Processor:
- test_processor_produces_valid_datum_spec: verifies DatumSpec fields
- test_prompted_text_contains_boxed_literal: verifies \boxed{} survives
- test_placeholder_conversion_for_nemotron_processor: verifies <image>
placeholder and question text in vllm_content
Uses a tiny 1x1 PNG fixture for image resolution. All 17 MMPR tests
pass (11 dataset + 6 reward).
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
- Stub now captures the exact text arg passed to __call__ - Assert exact equality: vllm_content == "<image>\n" + prompted_question - Assert exactly one <image> token in output (no duplicates) - Negative assertion: raw dataset string "<image>\nQuestion" not in output - Assert captured __call__ text matches expected tokenizer input - All tests run with -p no:testmon --override-ini='addopts=' Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Clean up partial state (stale images_dir, parquet, temp dir) before re-downloading when the ready marker is absent. Prevents shutil.move from nesting images/images when images_dir already exists from an interrupted prior attempt. Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
MMPR-Tiny has 4,192 rows with multiple images (up to 11). The previous code truncated to images[0], losing visual context for multi-image questions. - _load_mmpr_tiny_from_cache: keep all image paths instead of [imgs[0]] - format_mmpr_tiny_dataset: split question on <image> and <image_N> placeholders, interleave image content items with text segments - Add tests for multi-image and numbered-placeholder formatting Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
…nting Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
…sound_projection to resume training Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Resolved conflicts: - pyproject.toml: kept transformers==5.5.0 (gemma support) over origin/main's 5.3.0, force-overriding past vLLM 0.20.0's !=5.5.0 constraint; took origin/main's mlflow>=3.12.0. - tests/unit/test_recipes_and_test_suites.py: kept nightly GPU-hours ceiling at 1410 (merged suite dry-runs to 1409 GPU hours; 1360 would fail). - uv.lock: regenerated from merged pyproject.toml (transformers 5.3.0 -> 5.5.0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/ok to test dfaeb37 |
…lues Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: root <zhangyuekai@foxmail.com>
Contributor
Author
|
/ok to test ef24ee2 |
This reverts commit 5de2521. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: root <zhangyuekai@foxmail.com>
Contributor
Author
|
/ok to test 96c3ffe |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR follows the nano-v3-omni mbridge training branch to add AutoModel backend support for Nemotron-Nano-Omni.