fix(core): add multimodal support for qwen3.7-plus by pomelo-nwu · Pull Request #4803 · QwenLM/qwen-code

pomelo-nwu · 2026-06-05T10:00:51Z

Problem

qwen3.7-plus supports multimodal input (image + video), but the current modality detection logic treats it as text-only.

In Model Studio naming convention, Plus models are multimodal and Max models are text-only. The defaultModalities() function had no explicit pattern for qwen3.7-plus, so it fell through to the catch-all /^qwen/ → {} (text-only).

Closes #4802

Changes

1. `modalityDefaults.ts` — modality pattern

Added [/^qwen3\.7-plus/, { image: true, video: true }] to MODALITY_PATTERNS, placed before the catch-all /^qwen/.

2. `dashscope.ts` — vision model detection

Added qwen3.6-plus and qwen3.7-plus to VISION_MODEL_PREFIX_PATTERNS so the DashScope provider correctly sets vl_high_resolution_images: true for these models. (qwen3.6-plus was also missing — added for consistency.)

3. Tests

qwen3.7-plus → { image: true, video: true } (multimodal)
qwen3.7-max → {} (text-only, already correct)

How to verify

Set model to qwen3.7-plus via custom config or token plan
Send an image in the prompt
Confirm the image is sent as inline multimodal data, not downgraded to a text placeholder
Set model to qwen3.7-max and confirm it remains text-only

qwen3.7-plus supports image+video input (Plus = multimodal), but defaultModalities() had no pattern for it, falling through to the /^qwen/ catch-all which returns text-only. Changes: - Add qwen3.7-plus pattern to MODALITY_PATTERNS (image + video) - Add qwen3.6-plus and qwen3.7-plus to DashScope VISION_MODEL_PREFIX_PATTERNS - Add tests for qwen3.7-plus (multimodal) and qwen3.7-max (text-only) Closes #4802

github-actions · 2026-06-05T10:02:05Z

📋 Review Summary

This PR adds multimodal support (image + video) for the qwen3.7-plus model by updating modality detection patterns and vision model prefixes. The implementation is minimal, focused, and follows existing patterns correctly. The changes address the issue where qwen3.7-plus was incorrectly treated as text-only due to falling through to the catch-all /^qwen/ pattern.

🔍 General Feedback

The PR correctly identifies and fixes the root cause: missing pattern for qwen3.7-plus in modality detection
Changes follow established conventions in the codebase (pattern ordering, comment structure, test format)
The fix is appropriately minimal - only adding what's necessary without over-engineering
Good inclusion of qwen3.6-plus in the vision model patterns for consistency (it was also missing)
Test coverage validates both the positive case (qwen3.7-plus → multimodal) and the contrast case (qwen3.7-max → text-only)

🎯 Specific Feedback

🔵 Low

File: modalityDefaults.ts:43 - Consider updating the comment to explicitly mention the naming convention that "Plus models are multimodal, Max models are text-only" as stated in the PR description. This would make the pattern self-documenting for future maintainers:
```
// Qwen Plus models: image + video support (Max models are text-only per Model Studio naming convention)
```
File: dashscope.ts:360-363 - The VISION_MODEL_PREFIX_PATTERNS array now has inline comments for some entries but not all. For consistency, consider adding a comment to qwen3-vl-plus explaining it has built-in vision capabilities (similar to the qwen3.5-plus comment):
```
'qwen3-vl-plus', // qwen3-vl-plus (vision-language model)
```

✅ Highlights

Excellent problem identification with clear explanation of the Model Studio naming convention (Plus = multimodal, Max = text-only)
Test cases are well-designed to prevent regression and document expected behavior
The ordering of patterns is correct (specific qwen3.7-plus before catch-all /^qwen/)
Proper use of regex escaping for the dot in qwen3\.7-plus
Good consistency fix by adding qwen3.6-plus alongside qwen3.7-plus in vision model detection

qwen-code-ci-bot · 2026-06-05T10:03:15Z

Thanks for the PR @pomelo-nwu! 👋

Template: The PR description has the key information (problem, changes, how to verify) but doesn't follow the PR template headings. Missing: ## What this PR does / ## Why it's needed, ### Evidence (Before & After), ### Tested on table, ## Risk & Scope, and the Chinese translation <details> block. Not blocking for a small fix like this, but please follow the template in future PRs.

Direction: Clear bug fix — qwen3.7-plus is a real model that supports multimodal input, and users hitting the text-only fallback would have images silently downgraded. This is squarely within core mission. Linked issue #4802 is well-written. ✅

Approach: The scope is minimal and correct — 3 files, +16/-1, following existing patterns exactly. Adding qwen3.6-plus to VISION_MODEL_PREFIX_PATTERNS alongside qwen3.7-plus is a good consistency catch (it was already in MODALITY_PATTERNS but missing from the vision prefix list). No simpler path exists — this is already the minimal fix.

Moving on to code review. 🔍

中文说明

感谢 PR @pomelo-nwu！👋

模板： PR 描述包含了关键信息（问题、变更、验证方法），但没有遵循 PR 模板的标题格式。缺少：## What this PR does / ## Why it's needed、### Evidence (Before & After)、### Tested on 表格、## Risk & Scope 以及中文翻译 <details> 块。对于这种小修复不阻塞，但以后的 PR 请遵循模板。

方向： 明确的 bug 修复 — qwen3.7-plus 是支持多模态输入的真实模型，用户如果遇到文本兜底会导致图片被静默降级。完全在核心使命范围内。关联 issue #4802 描述清晰。✅

方案： 范围最小且正确 — 3 个文件，+16/-1，完全遵循现有模式。在 VISION_MODEL_PREFIX_PATTERNS 中同时补充 qwen3.6-plus 和 qwen3.7-plus 是一个好的一致性修复（qwen3.6-plus 已在 MODALITY_PATTERNS 中但缺少视觉前缀）。不存在更简路径 — 这已经是最小修复。

进入代码审查 🔍

— Qwen Code · qwen3.7-max

wenshao

No review findings. Downgraded from Approve to Comment: CI failing (Test ×3, Lint, triage, Post Coverage Comment). — qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T12:28:45Z

Local Verification Report

Branch: fix/qwen37-plus-multimodal → main
Environment: macOS Darwin 25.4.0, Node.js local

TypeScript Compilation (`tsc --noEmit`)

Package	PR Branch	main (latest)	Status
`packages/core`	1 error	0 errors	⚠️ Stale branch — see note below

The single error is src/tools/skill.ts(465,30): error TS2322 — a file NOT touched by this PR. It originates from the PR's base commit being behind current main. Latest main compiles cleanly. A rebase will resolve this; CI should pass after rebase.

Unit Tests (`vitest`)

Test File	PR Branch	main	Status
`modalityDefaults.test.ts`	34 passed	32 passed	✅ All pass (+2 new tests)
`dashscope.test.ts`	56 passed	—	✅ All pass

New tests cover:

qwen3.7-plus returns { image: true, video: true } from defaultModalities()
qwen3.7-max remains text-only (ensures pattern specificity)

Code Review

Changes are minimal and well-targeted:

modalityDefaults.ts — Added regex pattern /^qwen3\.7-plus/ → { image: true, video: true } (line 46)
dashscope.ts — Added 'qwen3.7-plus' to VISION_MODEL_PREFIX_PATTERNS for vl_high_resolution_images support (line 364)
modalityDefaults.test.ts — Added 2 test cases for qwen3.7-plus and qwen3.7-max

Pattern ordering is correct — specific qwen3.7-plus match comes before the catch-all qwen → {} pattern.

Verdict

✅ Ready to merge — No regressions. All tests pass. Recommend rebasing onto latest main to clear the stale skill.ts TSC error before merging.

yiliang114 · 2026-06-05T16:45:30Z

@qwen-code /triage

qwen-code-ci-bot · 2026-06-05T16:52:24Z

@/tmp/stage-2.md

qwen-code-ci-bot · 2026-06-05T16:52:28Z

@/tmp/stage-3.md

qwen-code-ci-bot

LGTM, looks ready to ship. ✅

yiliang114 · 2026-06-05T17:17:56Z

Both @/tmp/stage-2.md and @/tmp/stage-3.md above are a triage bot bug — --prompt mode skipped skill framework loading, so the bot pasted the staging file path instead of the comment body. Fix is in #4787; the bot will repost real Stage 2 / Stage 3 reviews here once that merges.

qwen-code-ci-bot added category/core Core engine and logic scope/model-switching Model selection and switching type/bug Something isn't working as expected welcome-pr labels Jun 5, 2026

wenshao reviewed Jun 5, 2026

View reviewed changes

wenshao approved these changes Jun 5, 2026

View reviewed changes

qwen-code-ci-bot approved these changes Jun 5, 2026

View reviewed changes

yiliang114 mentioned this pull request Jun 5, 2026

ci(triage): Fix Qwen triage workflow prompt #4787

Merged

github-actions Bot mentioned this pull request Jun 6, 2026

📊 AI CLI 工具社区动态日报 2026-06-06 litang9/big_model_radar#21

Open

tanzhenxin merged commit 1d9984f into main Jun 8, 2026
12 of 27 checks passed

This was referenced Jun 8, 2026

📊 AI CLI 工具社区动态日报 2026-06-08 jasonalang/big_model_radar#56

Open

📊 AI CLI 工具社区动态日报 2026-06-08 litang9/big_model_radar#31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): add multimodal support for qwen3.7-plus#4803

fix(core): add multimodal support for qwen3.7-plus#4803
tanzhenxin merged 1 commit into
mainfrom
fix/qwen37-plus-multimodal

pomelo-nwu commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

qwen-code-ci-bot commented Jun 5, 2026

Uh oh!

wenshao left a comment

Uh oh!

wenshao commented Jun 5, 2026

Uh oh!

yiliang114 commented Jun 5, 2026

Uh oh!

qwen-code-ci-bot commented Jun 5, 2026

Uh oh!

qwen-code-ci-bot commented Jun 5, 2026

Uh oh!

qwen-code-ci-bot left a comment

Uh oh!

yiliang114 commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pomelo-nwu commented Jun 5, 2026

Problem

Changes

1. modalityDefaults.ts — modality pattern

2. dashscope.ts — vision model detection

3. Tests

How to verify

Uh oh!

github-actions Bot commented Jun 5, 2026

📋 Review Summary

🔍 General Feedback

🎯 Specific Feedback

🔵 Low

✅ Highlights

Uh oh!

qwen-code-ci-bot commented Jun 5, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented Jun 5, 2026

Local Verification Report

TypeScript Compilation (tsc --noEmit)

Unit Tests (vitest)

Code Review

Verdict

Uh oh!

yiliang114 commented Jun 5, 2026

Uh oh!

qwen-code-ci-bot commented Jun 5, 2026

Uh oh!

qwen-code-ci-bot commented Jun 5, 2026

Uh oh!

qwen-code-ci-bot left a comment

Choose a reason for hiding this comment

Uh oh!

yiliang114 commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

1. `modalityDefaults.ts` — modality pattern

2. `dashscope.ts` — vision model detection

TypeScript Compilation (`tsc --noEmit`)

Unit Tests (`vitest`)