Skip to content

fix(core): add multimodal support for qwen3.7-plus#4803

Merged
tanzhenxin merged 1 commit into
mainfrom
fix/qwen37-plus-multimodal
Jun 8, 2026
Merged

fix(core): add multimodal support for qwen3.7-plus#4803
tanzhenxin merged 1 commit into
mainfrom
fix/qwen37-plus-multimodal

Conversation

@pomelo-nwu

Copy link
Copy Markdown
Collaborator

Problem

qwen3.7-plus supports multimodal input (image + video), but the current modality detection logic treats it as text-only.

In Model Studio naming convention, Plus models are multimodal and Max models are text-only. The defaultModalities() function had no explicit pattern for qwen3.7-plus, so it fell through to the catch-all /^qwen/{} (text-only).

Closes #4802

Changes

1. modalityDefaults.ts — modality pattern

Added [/^qwen3\.7-plus/, { image: true, video: true }] to MODALITY_PATTERNS, placed before the catch-all /^qwen/.

2. dashscope.ts — vision model detection

Added qwen3.6-plus and qwen3.7-plus to VISION_MODEL_PREFIX_PATTERNS so the DashScope provider correctly sets vl_high_resolution_images: true for these models. (qwen3.6-plus was also missing — added for consistency.)

3. Tests

  • qwen3.7-plus{ image: true, video: true } (multimodal)
  • qwen3.7-max{} (text-only, already correct)

How to verify

  1. Set model to qwen3.7-plus via custom config or token plan
  2. Send an image in the prompt
  3. Confirm the image is sent as inline multimodal data, not downgraded to a text placeholder
  4. Set model to qwen3.7-max and confirm it remains text-only

qwen3.7-plus supports image+video input (Plus = multimodal), but
defaultModalities() had no pattern for it, falling through to the
/^qwen/ catch-all which returns text-only.

Changes:
- Add qwen3.7-plus pattern to MODALITY_PATTERNS (image + video)
- Add qwen3.6-plus and qwen3.7-plus to DashScope VISION_MODEL_PREFIX_PATTERNS
- Add tests for qwen3.7-plus (multimodal) and qwen3.7-max (text-only)

Closes #4802
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR adds multimodal support (image + video) for the qwen3.7-plus model by updating modality detection patterns and vision model prefixes. The implementation is minimal, focused, and follows existing patterns correctly. The changes address the issue where qwen3.7-plus was incorrectly treated as text-only due to falling through to the catch-all /^qwen/ pattern.

🔍 General Feedback

  • The PR correctly identifies and fixes the root cause: missing pattern for qwen3.7-plus in modality detection
  • Changes follow established conventions in the codebase (pattern ordering, comment structure, test format)
  • The fix is appropriately minimal - only adding what's necessary without over-engineering
  • Good inclusion of qwen3.6-plus in the vision model patterns for consistency (it was also missing)
  • Test coverage validates both the positive case (qwen3.7-plus → multimodal) and the contrast case (qwen3.7-max → text-only)

🎯 Specific Feedback

🔵 Low

  • File: modalityDefaults.ts:43 - Consider updating the comment to explicitly mention the naming convention that "Plus models are multimodal, Max models are text-only" as stated in the PR description. This would make the pattern self-documenting for future maintainers:

    // Qwen Plus models: image + video support (Max models are text-only per Model Studio naming convention)
    
  • File: dashscope.ts:360-363 - The VISION_MODEL_PREFIX_PATTERNS array now has inline comments for some entries but not all. For consistency, consider adding a comment to qwen3-vl-plus explaining it has built-in vision capabilities (similar to the qwen3.5-plus comment):

    'qwen3-vl-plus', // qwen3-vl-plus (vision-language model)

✅ Highlights

  • Excellent problem identification with clear explanation of the Model Studio naming convention (Plus = multimodal, Max = text-only)
  • Test cases are well-designed to prevent regression and document expected behavior
  • The ordering of patterns is correct (specific qwen3.7-plus before catch-all /^qwen/)
  • Proper use of regex escaping for the dot in qwen3\.7-plus
  • Good consistency fix by adding qwen3.6-plus alongside qwen3.7-plus in vision model detection

@qwen-code-ci-bot qwen-code-ci-bot added category/core Core engine and logic scope/model-switching Model selection and switching type/bug Something isn't working as expected welcome-pr labels Jun 5, 2026
@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR @pomelo-nwu! 👋

Template: The PR description has the key information (problem, changes, how to verify) but doesn't follow the PR template headings. Missing: ## What this PR does / ## Why it's needed, ### Evidence (Before & After), ### Tested on table, ## Risk & Scope, and the Chinese translation <details> block. Not blocking for a small fix like this, but please follow the template in future PRs.

Direction: Clear bug fix — qwen3.7-plus is a real model that supports multimodal input, and users hitting the text-only fallback would have images silently downgraded. This is squarely within core mission. Linked issue #4802 is well-written. ✅

Approach: The scope is minimal and correct — 3 files, +16/-1, following existing patterns exactly. Adding qwen3.6-plus to VISION_MODEL_PREFIX_PATTERNS alongside qwen3.7-plus is a good consistency catch (it was already in MODALITY_PATTERNS but missing from the vision prefix list). No simpler path exists — this is already the minimal fix.

Moving on to code review. 🔍

中文说明

感谢 PR @pomelo-nwu!👋

模板: PR 描述包含了关键信息(问题、变更、验证方法),但没有遵循 PR 模板 的标题格式。缺少:## What this PR does / ## Why it's needed### Evidence (Before & After)### Tested on 表格、## Risk & Scope 以及中文翻译 <details> 块。对于这种小修复不阻塞,但以后的 PR 请遵循模板。

方向: 明确的 bug 修复 — qwen3.7-plus 是支持多模态输入的真实模型,用户如果遇到文本兜底会导致图片被静默降级。完全在核心使命范围内。关联 issue #4802 描述清晰。✅

方案: 范围最小且正确 — 3 个文件,+16/-1,完全遵循现有模式。在 VISION_MODEL_PREFIX_PATTERNS 中同时补充 qwen3.6-plusqwen3.7-plus 是一个好的一致性修复(qwen3.6-plus 已在 MODALITY_PATTERNS 中但缺少视觉前缀)。不存在更简路径 — 这已经是最小修复。

进入代码审查 🔍

Qwen Code · qwen3.7-max

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings. Downgraded from Approve to Comment: CI failing (Test ×3, Lint, triage, Post Coverage Comment). — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Local Verification Report

Branch: fix/qwen37-plus-multimodalmain
Environment: macOS Darwin 25.4.0, Node.js local

TypeScript Compilation (tsc --noEmit)

Package PR Branch main (latest) Status
packages/core 1 error 0 errors ⚠️ Stale branch — see note below

The single error is src/tools/skill.ts(465,30): error TS2322 — a file NOT touched by this PR. It originates from the PR's base commit being behind current main. Latest main compiles cleanly. A rebase will resolve this; CI should pass after rebase.

Unit Tests (vitest)

Test File PR Branch main Status
modalityDefaults.test.ts 34 passed 32 passed ✅ All pass (+2 new tests)
dashscope.test.ts 56 passed ✅ All pass

New tests cover:

  • qwen3.7-plus returns { image: true, video: true } from defaultModalities()
  • qwen3.7-max remains text-only (ensures pattern specificity)

Code Review

Changes are minimal and well-targeted:

  1. modalityDefaults.ts — Added regex pattern /^qwen3\.7-plus/{ image: true, video: true } (line 46)
  2. dashscope.ts — Added 'qwen3.7-plus' to VISION_MODEL_PREFIX_PATTERNS for vl_high_resolution_images support (line 364)
  3. modalityDefaults.test.ts — Added 2 test cases for qwen3.7-plus and qwen3.7-max

Pattern ordering is correct — specific qwen3.7-plus match comes before the catch-all qwen{} pattern.

Verdict

Ready to merge — No regressions. All tests pass. Recommend rebasing onto latest main to clear the stale skill.ts TSC error before merging.

@yiliang114

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

@/tmp/stage-2.md

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

@/tmp/stage-3.md

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

@yiliang114

Copy link
Copy Markdown
Collaborator

Both @/tmp/stage-2.md and @/tmp/stage-3.md above are a triage bot bug — --prompt mode skipped skill framework loading, so the bot pasted the staging file path instead of the comment body. Fix is in #4787; the bot will repost real Stage 2 / Stage 3 reviews here once that merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category/core Core engine and logic scope/model-switching Model selection and switching type/bug Something isn't working as expected welcome-pr

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: qwen3.7-plus should support multimodal (image/video) input

5 participants