Checklist
Describe the bug
🧾 Description:
When deploying qwen2.5-vl-72b-awq using SGLang, image inputs (via image_url) are not correctly handled. The same prompt works as expected in vLLM, where the model successfully describes the image.
Reproduction
✅ Reproduction Steps:
✅ SGLang Launch Command:
python -m sglang.launch_server \
--model-path qwen-vl-72b \
--port 30000 \
--trust-remote-code \
--host 0.0.0.0 \
--mem-fraction-static 0.8 \
--tp 4 \
--tool-call-parser qwen25
✅ OpenAI-Compatible API Call (cURL):
curl -X POST "http://0.0.0.0:30000/v1/chat/completions" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-vl",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "describe this picture"
},
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
}
}
]
}
],
"top_p": 0.8
}'
🧾 SGLang Response:
{
"id": "803d3c01743b4429b61c0a83d60eda5b",
"object": "chat.completion",
"created": 1742528000,
"model": "qwen2.5-vl",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm sorry, but I cannot see any picture attached to your message. Could you please provide more information or upload the picture again? I'll do my best to describe it for you."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 22,
"completion_tokens": 41,
"total_tokens": 63
}
}
✅ Comparison with vLLM:
Using the exact same model and cURL request, the image is successfully described in vLLM deployment. This confirms that the issue is not with the prompt or model, but with how SGLang handles image_url type content in the message payload.
📌 Expected Behavior:
SGLang should support OpenAI-compatible image inputs by correctly parsing messages.content[].image_url.url and feeding the image into the model’s visual encoder.
Environment
🧪 Environment:
- Model:
qwen2.5-vl-72b-awq
- Deployment: SGLang 0.4.4.post1
- API Protocol: OpenAI-compatible Chat Completions API
- vLLM Behavior: ✅ Working as expected
Checklist
Describe the bug
🧾 Description:
When deploying
qwen2.5-vl-72b-awqusing SGLang, image inputs (viaimage_url) are not correctly handled. The same prompt works as expected in vLLM, where the model successfully describes the image.Reproduction
✅ Reproduction Steps:
✅ SGLang Launch Command:
✅ OpenAI-Compatible API Call (cURL):
🧾 SGLang Response:
{ "id": "803d3c01743b4429b61c0a83d60eda5b", "object": "chat.completion", "created": 1742528000, "model": "qwen2.5-vl", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I'm sorry, but I cannot see any picture attached to your message. Could you please provide more information or upload the picture again? I'll do my best to describe it for you." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 22, "completion_tokens": 41, "total_tokens": 63 } }✅ Comparison with vLLM:
Using the exact same model and cURL request, the image is successfully described in vLLM deployment. This confirms that the issue is not with the prompt or model, but with how SGLang handles
image_urltype content in the message payload.📌 Expected Behavior:
SGLang should support OpenAI-compatible image inputs by correctly parsing
messages.content[].image_url.urland feeding the image into the model’s visual encoder.Environment
🧪 Environment:
qwen2.5-vl-72b-awq