model: support gemma-3-it#4424
Conversation
8363561 to
669e44b
Compare
There was a problem hiding this comment.
I think base image prcessor don't need change?
f266005 to
70ae46c
Compare
This comment was marked as resolved.
This comment was marked as resolved.
062911b to
7936216
Compare
|
@mickqian @yizhang2077 is this ready to merge? approved? |
|
I add an issue for keeping track of current VLM models performance in mmmu benchmark. We can update benchmark result here #4456 @mickqian @zhaochenyang20 |
There was a problem hiding this comment.
Why we need add regex here?
e6ac032 to
7b94f77
Compare
Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>
|
@zhaochenyang20 this PR can be merged |
|
@zhaochenyang20 This is ready. Many thanks |
|
@mickqian @yizhang2077 thanks. I will tell lianmin! |
|
@Ying1123 hey, ying, this can be merged. it's high-prioritized. |
|
@zhaochenyang20 @mickqian Hello, I've been running Gemma3 27b it model in vllm 0.8.0 and sglang installed from source with original weights on an H100 GPU. My results show that for the same long text-only query, the outputs of the models differ significantly. In vllm, generation proceeds normally, but in sglang, during long generation, it starts to degrade into garbage output and continues indefinitely. This phenomenon occurs with long queries and queries containing code. Could someone else test this as well? Also, a question: is prefix caching supported in sglang for multimodal models, particularly Gemma 3? |
supported. could you give us your reproducable scripts. We will fix this ASAP> |
|
@zhaochenyang20 You'll see that the generation doesn't stop and something like this will begin: |
cc @mickqian mick |
|
a fix is on the way |
|
Gemma3's generation speed is surprisingly slow compared to other 3B/4B models like Qwen2.5-3B. Is the current Gemma3 implementation correct? |
|
great! |






Motivation
Support gemma3-it.
FYI,
gemma3-1b-itis an llm,gemma3-ptseries are not chat models.Modifications
Checklist