Skip to content

[Bugfix] Fix Qwen3.5 Marlin TP failure for GDN in_proj_ba#36199

Open
AjAnubolu wants to merge 2 commits into
vllm-project:mainfrom
AjAnubolu:fix/qwen35-marlin-tp-35924
Open

[Bugfix] Fix Qwen3.5 Marlin TP failure for GDN in_proj_ba#36199
AjAnubolu wants to merge 2 commits into
vllm-project:mainfrom
AjAnubolu:fix/qwen35-marlin-tp-35924

Conversation

@AjAnubolu

Copy link
Copy Markdown
Contributor

Summary

Closes #35924

Split the GDN in_proj_ba linear into separate in_proj_b and in_proj_a
so each column dimension meets Marlin's MIN_THREAD_N=64 constraint at TP>=4.

The in_proj_ba linear layer has output dim = 2 * num_kv_heads which
can be < GPTQ_MARLIN_MIN_THREAD_N (64) when sharded. Use disable_tp
and quant_config=None for this layer, then manually slice b/a for
the local TP rank.

Fixes vllm-project#35924

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
@AjAnubolu AjAnubolu requested a review from sighingnow as a code owner March 6, 2026 02:39
@mergify mergify Bot added qwen Related to Qwen models bug Something isn't working labels Mar 6, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a Tensor Parallelism failure for Qwen3.5 with Marlin quantization. The fix involves disabling tensor parallelism for the in_proj_ba layer in the Gated Delta Network, which is too small to be sharded correctly with Marlin's constraints. Instead, the layer is replicated, and its output is manually sliced per TP rank. The changes in qwen3_5.py and qwen3_next.py are consistent with this approach. However, I've found a critical issue in qwen3_next.py where a reshape operation is incorrect for sequence lengths greater than 1, which will likely cause a runtime error during prefill.

Comment on lines +553 to +554
b = b.reshape(b.size(0), self.num_v_heads)
a = a.reshape(a.size(0), self.num_v_heads)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The reshape operation for b and a appears to be incorrect for sequence lengths (sq) greater than 1. The tensor b has a shape of (bs, sq, num_k_heads, num_v_heads // num_k_heads), but it's being reshaped to (bs, self.num_v_heads). This will raise a runtime error during prefill when sq > 1 because the number of elements won't match.

The reshape should probably flatten the batch and sequence dimensions (bs and sq) to get a tensor with shape (num_tokens, num_v_heads).

Suggested change
b = b.reshape(b.size(0), self.num_v_heads)
a = a.reshape(a.size(0), self.num_v_heads)
b = b.reshape(-1, self.num_v_heads)
a = a.reshape(-1, self.num_v_heads)

Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
@mergify

mergify Bot commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AjAnubolu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 10, 2026
@mergify mergify Bot removed the needs-rebase label Apr 23, 2026
@mergify

mergify Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AjAnubolu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 23, 2026
@mergify mergify Bot removed the needs-rebase label May 23, 2026
@mergify

mergify Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AjAnubolu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Qwen3.5 GatedDeltaNet in_proj_ba fails Marlin MIN_THREAD_N=64 at TP>=4

1 participant