Skip to content

refactor: bug fixes and refactor for vlm#4661

Merged
zhyncs merged 2 commits intosgl-project:mainfrom
mickqian:vlm-refactor
Mar 23, 2025
Merged

refactor: bug fixes and refactor for vlm#4661
zhyncs merged 2 commits intosgl-project:mainfrom
mickqian:vlm-refactor

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented Mar 22, 2025

Motivation

  1. a general vlm embed routine
  2. sglang image processors architecture improvement
  3. fix the bug of registering customized image processor to transformers
  4. add some consistency to vlm api

Modifications

Checklist

Comment thread python/sglang/srt/managers/mm_utils.py Outdated
Copy link
Copy Markdown
Collaborator Author

@mickqian mickqian Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this modification, the result of mmmu benchmark of sglang goes from:

accuracy time elapsed (s)
Qwen2.5-VL-7B-Instruct 0.462 229.1
gemma-3-4b-it 0.409 226.4

to:

accuracy time elapsed (s)
Qwen2.5-VL-7B-Instruct 0.477 230.2
gemma-3-4b-it 0.409 202.3

@mickqian
Copy link
Copy Markdown
Collaborator Author

mickqian commented Mar 22, 2025

This should fix #4607 #4159 and many other similar issues I can't count

@mickqian mickqian mentioned this pull request Mar 22, 2025
4 tasks
@zhaochenyang20
Copy link
Copy Markdown
Collaborator

@mickqian I've rerun the CI. thanks! minicpm-o is of high-priority qvq

@mickqian mickqian changed the title refactor: bug fixes and refactor of vlm refactor: bug fixes and refactor for vlm Mar 22, 2025
@mickqian mickqian force-pushed the vlm-refactor branch 6 times, most recently from d64a33b to f644731 Compare March 22, 2025 14:45
@zhaochenyang20
Copy link
Copy Markdown
Collaborator

@mickqian this is not ready to review right? mick, if ready, please ping yi and me

@mickqian mickqian force-pushed the vlm-refactor branch 4 times, most recently from 61f6c15 to 0594b06 Compare March 22, 2025 16:49
@mickqian
Copy link
Copy Markdown
Collaborator Author

@yizhang2077 @zhaochenyang20 ready now. vlm tests all passed. Thanks

@mickqian mickqian force-pushed the vlm-refactor branch 2 times, most recently from e12c107 to ae6735f Compare March 22, 2025 17:16
Copy link
Copy Markdown
Collaborator

@yizhang2077 yizhang2077 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add some basic comments here, continue to reviewing

Comment thread test/srt/run_suite.py Outdated
Comment thread python/sglang/srt/configs/utils.py Outdated
Comment thread python/sglang/srt/models/gemma3_mm.py Outdated
Comment thread python/sglang/srt/models/minicpmv.py Outdated
Comment thread python/sglang/srt/models/qwen2_5_vl.py Outdated
Comment thread python/sglang/srt/conversation.py Outdated
Comment thread test/srt/test_vlm_accuracy.py Outdated
Comment thread python/sglang/srt/managers/schedule_batch.py Outdated
@yizhang2077
Copy link
Copy Markdown
Collaborator

yizhang2077 commented Mar 23, 2025

LGTM, great code refactoring! After CI pass, this PR can be merged cc @zhaochenyang20

@yizhang2077 yizhang2077 mentioned this pull request Mar 23, 2025
6 tasks
@zhaochenyang20
Copy link
Copy Markdown
Collaborator

@mickqian @yizhang2077 great!

@zhyncs zhyncs merged commit 11577ce into sgl-project:main Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants