Skip to content

Add video processing utils to vision_utils#279

Merged
danielhanchen merged 1 commit into
unslothai:mainfrom
mmathew23:video-inference-feature
Sep 30, 2025
Merged

Add video processing utils to vision_utils#279
danielhanchen merged 1 commit into
unslothai:mainfrom
mmathew23:video-inference-feature

Conversation

@mmathew23

Copy link
Copy Markdown
Collaborator

This is a cleaned up and merged version of
#240.

This adds video processing utilities for VLM finetuning.

It's based on qwen-vl-utils repo https://github.com/QwenLM/Qwen2.5-VL/tree/main/qwen-vl-utils

I tested all the Vision notebooks to confirm everything works. Video Finetuning notebook on the way.

Gemma3: https://colab.research.google.com/drive/1gzVgFvFou6dTE9UvMZR5DEiQTzqc3A4S?usp=sharing
Llama Vision: https://colab.research.google.com/drive/1E2xCsbh7-raFYMtkbFOwgvcfIyn64zLL?usp=sharing
Pixtral: https://colab.research.google.com/drive/1meeeZnE-IlV9IRuEUFqgbUNFpwxtrN5g?usp=sharing
Qwen 2.5VL: https://colab.research.google.com/drive/1RrRUnHBLMPSWfzHoc853iij4O4Yly7fr?usp=sharing

*qwen 2.5vl was tested with transformers 4.56.1

This is a cleaned up and merged version of
unslothai#240.

This adds video processing utilities for VLM finetuning.

It's based on qwen-vl-utils repo https://github.com/QwenLM/Qwen2.5-VL/tree/main/qwen-vl-utils

Co-authored-by: autinn <au-yeung@uni.minerva.edu>
Co-authored-by: Neenu Antony <Neenu.antony@sjsu.edu>
Co-authored-by: Suchith Gali <sgali@ucmerced.edu>
@madhav1ag

Copy link
Copy Markdown

@mmathew23 Thanks for the video support code. May I know when we can expect video finetuning notebooks for Qwen2.5-VL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants