Fix Qwen3.5 attention wrapper to accept new mlx-vlm kwargs by BardiaKoopah · Pull Request #721 · unslothai/unsloth-zoo

BardiaKoopah · 2026-06-04T23:10:14Z

mlx-vlm 0.6.1 added position_embeddings and target_verify kwargs to Qwen3_5Attention.call. The patched wrapper rejected them with a TypeError on training start.

Add **kwargs to the signature and forward to the original call. Using **kwargs instead of named params for durability as mlx-vlm frequently adds new kwargs to this interface.

Fixes #6002 (first crash)

mlx-vlm 0.6.1 added position_embeddings and target_verify kwargs to Qwen3_5Attention.__call__. The patched wrapper rejected them with a TypeError on training start. Add **kwargs to the signature and forward to the original call. Using **kwargs instead of named params for durability as mlx-vlm frequently adds new kwargs to this interface. Fixes #6002 (first crash)

gemini-code-assist

Code Review

This pull request updates the _fix_qwen35_attention_cache function in unsloth_zoo/mlx/loader.py to accept and forward arbitrary keyword arguments (**kwargs) in the patched attention call. This ensures compatibility with any additional arguments that might be passed to the attention mechanism. There are no review comments, and I have no further feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

danielhanchen · 2026-06-10T11:34:07Z

Validated this on mlx 0.31.2 + mlx-vlm 0.6.2 (CPU backend) with a tiny random-weight qwen3_5 model:

Reproduced the exact TypeError from [Bug] Qwen3.5 LoRA SFT in Studio (MLX): patched_attn_call rejects position_embeddings (+ second VJP error after fix) unsloth#6002 with the current wrapper, and confirmed the unpatched attention dereferences cache.offset when cache=None (the original bug this patch function exists for).
With this patch, both the training forward and value_and_grad run, and the synthesized position_ids path is bit-identical to passing explicit position_ids through the unpatched attention (max abs diff 0).
Qwen3_5Attention.__call__ has the identical signature across mlx-vlm 0.6.1, 0.6.2 and current main, and the decoder layer always passes position_embeddings and target_verify, so **kwargs forwarding is the right durable fix here.
One subtlety checked: mlx-vlm only supplies position_embeddings together with non-None position_ids (both are computed and passed down together at the model level), so the cache=None synthesis never fires in that case, and the attention body ignores position_ids for rotary when position_embeddings is present. No extra guard is needed.

Two notes:

Scope: this fixes only the first crash in [Bug] Qwen3.5 LoRA SFT in Studio (MLX): patched_attn_call rejects position_embeddings (+ second VJP error after fix) unsloth#6002. The remaining VJP crash is covered by fix(mlx): Qwen3.5/3.6 VLM training — pass through new mlx-vlm attention kwargs, patch non-differentiable Metal kernels #738, which contains the same signature change and can rebase once this lands.
The branch needs a sync with main: ff12f344 reworded the comment line inside this function, so there is a trivial comment-only conflict to resolve.

…on kwargs, patch non-differentiable Metal kernels (#738) * fix(mlx): unblock Qwen3.5/3.6 MLX training crashing on new mlx-vlm kwargs and non-differentiable kernels Three fixes for LoRA training of mlx-vlm qwen3_5-family models, which currently crashes before the first step: 1. loader.py _fix_qwen35_attention_cache: the patched attention __call__ had a fixed signature and rejected the position_embeddings / target_verify kwargs added by newer mlx-vlm releases (TypeError: unexpected keyword argument 'position_embeddings'). Forward unknown kwargs through, and skip synthesizing position_ids when position_embeddings is already provided. 2. gated_delta_vjp.py patch_gated_delta_vlm: mlx_vlm.models.qwen3_5 ships its own gated_delta_update that calls mlx_lm's gated_delta_kernel directly, so patch_gated_delta() (mlx_lm-only) never intercepts the VLM path and training dies with ValueError: [Primitive::vjp] Not implemented for CustomKernel. Apply the same state=None -> gated_delta_ops_efficient routing to the mlx_vlm module, in both namespaces that hold a reference (the defining module and language.py's from-import). 3. loader.py _disable_fused_mrope: MRoPERotaryEmbedding.apply_rotary takes a fused Metal kernel path whenever Metal is available, with no gradient support and no training gate; same VJP crash. Flip fused_apply off on the model so apply_rotary uses its differentiable cos/sin fallback. The remaining custom kernels in mlx_vlm qwen3_5 (ragged decode SDPA, target-verify GEMV/QMV) are unreachable during training: they require a left-padded KV cache or target_verify=True (gdn_sink), neither of which occurs in the trainer's forward. Fixes the second crash in unslothai/unsloth#6002; item 1 overlaps PR #721. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Drop redundant attention wrapper hunk and tighten comments for PR #738 * Handle legacy mlx-vlm gated_delta import layout for PR #738 --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>

A newer mlx-lm adding parameters (precedent: the qwen3_5 attention kwargs that needed unslothai#721) would previously TypeError inside the patched wrapper. Silently dropping them could change training semantics, so unknown kwargs now delegate to the original implementation with a one-time warning naming the new arguments — old mlx-lm is unchanged, newer mlx-lm stays correct until the patch is updated.

BardiaKoopah mentioned this pull request Jun 4, 2026

[Bug] Qwen3.5 LoRA SFT in Studio (MLX): patched_attn_call rejects position_embeddings (+ second VJP error after fix) unslothai/unsloth#6002

Closed

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

benrey mentioned this pull request Jun 9, 2026

fix(mlx): Qwen3.5/3.6 VLM training — pass through new mlx-vlm attention kwargs, patch non-differentiable Metal kernels #738

Merged

Merge main into fix/qwen35-attn-kwargs to resolve comment-only conflict

979bf1d

danielhanchen merged commit 42f11b6 into unslothai:main Jun 10, 2026

This was referenced Jun 11, 2026

[staging] Verify issue 6002 fixes on main @e0f7e60d shimmyshimmer/unsloth-zoo-staging-2#31

Open

MLX Update Training #684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen3.5 attention wrapper to accept new mlx-vlm kwargs#721

Fix Qwen3.5 attention wrapper to accept new mlx-vlm kwargs#721
danielhanchen merged 2 commits into
unslothai:mainfrom
BardiaKoopah:fix/qwen35-attn-kwargs

BardiaKoopah commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

danielhanchen commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BardiaKoopah commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

danielhanchen commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants