Skip to content

[Bug] Qwen3-VL-235B-A22B-Thinking didn't parsed the image data #10906

@Xu-Wenqing

Description

@Xu-Wenqing

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Qwen3-VL-235B-A22B-Thinking didn't parsed the image data

Reproduction

import base64

import requests
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = ""
openai_api_base = ""

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base + "/v1",
)

models = client.models.list()
model = models.data[0].id
print(model)


def encode_base64_content_from_url(content_url: str) -> str:
    """Encode a content retrieved from a remote url to base64 format."""

    with requests.get(content_url) as response:
        response.raise_for_status()
        result = base64.b64encode(response.content).decode("utf-8")

    return result


def main():

    image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/QVQ/demo.png"

    stream = True

    image_base64 = encode_base64_content_from_url(image_url)

    chat_completion_from_base64 = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "使用中文回答,图中方框处应该是数字多少?",
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"},
                    }
                ],
            }
        ],
        model=model,
        max_completion_tokens=2048,
        stream=stream,
    )

    if stream:
        for chunk in chat_completion_from_base64:
            if hasattr(chunk.choices[0].delta, "reasoning_content"):
                reasoning_content = chunk.choices[0].delta.reasoning_content
                if reasoning_content:
                    print(reasoning_content, end="", flush=True)
            if hasattr(chunk.choices[0].delta, "content"):
                content = chunk.choices[0].delta.content
                if content:
                    print(content, end="", flush=True)
    else:
        result = chat_completion_from_base64.choices[0].message.content
        print(result)


if __name__ == "__main__":
    main()

output:

Qwen3-VL-235B-A22B-Thinking
嗯,用户问的是图中方框处应该填什么数字,但问题是没有提供图片。这有点麻烦,因为看不到图就无法判断具体是什么题目。

用户可能是在做数学题或者逻辑题时遇到了困难,急着想找答案。ta可能没意识到需要提供图片,或者以为之前的对话里有图。这种情况下ta可能会有点着急,因为卡在题目上做不下去了。

考虑到用户特意要求用中文回答,应该是个中文使用者,可能是在校学生或者备考人士。题目类型可能是数列题、几何题或者逻辑推理题,这类题经常需要看图。

我得明确告诉ta缺图片这个关键信息,但语气要温和些,避免让ta更着急。可以给几个常见题型的例子作为提示,这样如果ta能描述题目类型可能还能帮忙分析。

对了,得提醒ta描述时注意细节:比如数字排列方式、方框位置、周围数字规律等。不过现在首要的是让ta补充图片或详细描述,不然真的无从下手。

最后加个表情缓和语气吧,毕竟用户现在可能正烦躁呢。
</think>

由于您没有提供具体的图片,我无法直接确定方框中的数字。**请补充图片或详细描述题目内容**(例如:数字排列方式、方框位置、周围已知数字等),我将为您分析解答。

---

### 常见题型参考(需根据实际题目判断):
1. **数列规律题**  
   - 例:`2, 4, 8, □, 32` → 方框应为 **16**(等比数列,公比×2)。
2. **图形逻辑题**  
   - 例:三角形顶点数字和 = 底部数字 → 需计算关系。
3. **九宫格/矩阵题**  
   - 例:每行/列的和/积相等 → 需验证运算规则。
4. **进制转换题**  
   - 例:二进制转十进制(如 `101` = 5)。

---

### 请提供以下信息:
1. **题目类型**(数列、图形、九宫格等);
2. **已知数字及位置**(例如:第一行:3, □, 7);
3. **是否有运算规则提示**(如“每行和为15”)。

补充后,我将**立即给出答案和推理过程**! 🔍  
(例如:若描述为“九宫格第一行:2, □, 4;第二行:6, 8, 10;每行和相等”,则方框=12)%                                                                       

Environment

Latest version

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions