Skip to content

Update training script and test configurations for MIMO LLaVA#3293

Merged
liding-nv merged 1 commit into
mimo/phase5-checkpointing-rebuildfrom
kamran/mimo_llava_fixes
Apr 13, 2026
Merged

Update training script and test configurations for MIMO LLaVA#3293
liding-nv merged 1 commit into
mimo/phase5-checkpointing-rebuildfrom
kamran/mimo_llava_fixes

Conversation

@kamran-nvidia

Copy link
Copy Markdown
Contributor
  • Adjusted training iterations from 2000 to 1000 in run_hetero_llava.sh
  • Renamed experiment in wandb from "mimo-llava-e2e-test" to "mimo-llava-hetero-e2e-test"
  • Removed unused parallelism configurations in run_hetero_llava_parallelism_tests_unfrozen_llm.sh
  • Introduced CLIPViTNoCLS class in test_mimo_training_llava.py to drop CLS token
  • Updated encoder sequence length to 576 in test_mimo_training_llava.py
  • Modified argument parsing for freeze options to use custom boolean parser

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

- Adjusted training iterations from 2000 to 1000 in run_hetero_llava.sh
- Renamed experiment in wandb from "mimo-llava-e2e-test" to "mimo-llava-hetero-e2e-test"
- Removed unused parallelism configurations in run_hetero_llava_parallelism_tests_unfrozen_llm.sh
- Introduced CLIPViTNoCLS class in test_mimo_training_llava.py to drop CLS token
- Updated encoder sequence length to 576 in test_mimo_training_llava.py
- Modified argument parsing for freeze options to use custom boolean parser

Signed-off-by: Kamran Jafari <kjafarisadeg@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Apr 13, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kamran-nvidia kamran-nvidia marked this pull request as ready for review April 13, 2026 14:14
@kamran-nvidia kamran-nvidia requested a review from liding-nv April 13, 2026 14:15
@liding-nv liding-nv merged commit d1a37ee into mimo/phase5-checkpointing-rebuild Apr 13, 2026
2 checks passed
@liding-nv liding-nv deleted the kamran/mimo_llava_fixes branch April 13, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants