Skip to content

Fix Qwen3.5 attention wrapper to accept new mlx-vlm kwargs#721

Merged
danielhanchen merged 2 commits into
unslothai:mainfrom
BardiaKoopah:fix/qwen35-attn-kwargs
Jun 10, 2026
Merged

Fix Qwen3.5 attention wrapper to accept new mlx-vlm kwargs#721
danielhanchen merged 2 commits into
unslothai:mainfrom
BardiaKoopah:fix/qwen35-attn-kwargs

Conversation

@BardiaKoopah

Copy link
Copy Markdown
Contributor

mlx-vlm 0.6.1 added position_embeddings and target_verify kwargs to Qwen3_5Attention.call. The patched wrapper rejected them with a TypeError on training start.

Add **kwargs to the signature and forward to the original call. Using **kwargs instead of named params for durability as mlx-vlm frequently adds new kwargs to this interface.

Fixes #6002 (first crash)

mlx-vlm 0.6.1 added position_embeddings and target_verify kwargs to
Qwen3_5Attention.__call__. The patched wrapper rejected them with a
TypeError on training start.

Add **kwargs to the signature and forward to the original call.
Using **kwargs instead of named params for durability as mlx-vlm
frequently adds new kwargs to this interface.

Fixes #6002 (first crash)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the _fix_qwen35_attention_cache function in unsloth_zoo/mlx/loader.py to accept and forward arbitrary keyword arguments (**kwargs) in the patched attention call. This ensures compatibility with any additional arguments that might be passed to the attention mechanism. There are no review comments, and I have no further feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@danielhanchen

Copy link
Copy Markdown
Member

Validated this on mlx 0.31.2 + mlx-vlm 0.6.2 (CPU backend) with a tiny random-weight qwen3_5 model:

  • Reproduced the exact TypeError from [Bug] Qwen3.5 LoRA SFT in Studio (MLX): patched_attn_call rejects position_embeddings (+ second VJP error after fix) unsloth#6002 with the current wrapper, and confirmed the unpatched attention dereferences cache.offset when cache=None (the original bug this patch function exists for).
  • With this patch, both the training forward and value_and_grad run, and the synthesized position_ids path is bit-identical to passing explicit position_ids through the unpatched attention (max abs diff 0).
  • Qwen3_5Attention.__call__ has the identical signature across mlx-vlm 0.6.1, 0.6.2 and current main, and the decoder layer always passes position_embeddings and target_verify, so **kwargs forwarding is the right durable fix here.
  • One subtlety checked: mlx-vlm only supplies position_embeddings together with non-None position_ids (both are computed and passed down together at the model level), so the cache=None synthesis never fires in that case, and the attention body ignores position_ids for rotary when position_embeddings is present. No extra guard is needed.

Two notes:

  1. Scope: this fixes only the first crash in [Bug] Qwen3.5 LoRA SFT in Studio (MLX): patched_attn_call rejects position_embeddings (+ second VJP error after fix) unsloth#6002. The remaining VJP crash is covered by fix(mlx): Qwen3.5/3.6 VLM training — pass through new mlx-vlm attention kwargs, patch non-differentiable Metal kernels #738, which contains the same signature change and can rebase once this lands.
  2. The branch needs a sync with main: ff12f344 reworded the comment line inside this function, so there is a trivial comment-only conflict to resolve.

@danielhanchen danielhanchen merged commit 42f11b6 into unslothai:main Jun 10, 2026
danielhanchen added a commit that referenced this pull request Jun 11, 2026
…on kwargs, patch non-differentiable Metal kernels (#738)

* fix(mlx): unblock Qwen3.5/3.6 MLX training crashing on new mlx-vlm kwargs and non-differentiable kernels

Three fixes for LoRA training of mlx-vlm qwen3_5-family models, which
currently crashes before the first step:

1. loader.py _fix_qwen35_attention_cache: the patched attention __call__
   had a fixed signature and rejected the position_embeddings /
   target_verify kwargs added by newer mlx-vlm releases
   (TypeError: unexpected keyword argument 'position_embeddings').
   Forward unknown kwargs through, and skip synthesizing position_ids
   when position_embeddings is already provided.

2. gated_delta_vjp.py patch_gated_delta_vlm: mlx_vlm.models.qwen3_5 ships
   its own gated_delta_update that calls mlx_lm's gated_delta_kernel
   directly, so patch_gated_delta() (mlx_lm-only) never intercepts the
   VLM path and training dies with
   ValueError: [Primitive::vjp] Not implemented for CustomKernel.
   Apply the same state=None -> gated_delta_ops_efficient routing to the
   mlx_vlm module, in both namespaces that hold a reference (the defining
   module and language.py's from-import).

3. loader.py _disable_fused_mrope: MRoPERotaryEmbedding.apply_rotary
   takes a fused Metal kernel path whenever Metal is available, with no
   gradient support and no training gate; same VJP crash. Flip
   fused_apply off on the model so apply_rotary uses its differentiable
   cos/sin fallback.

The remaining custom kernels in mlx_vlm qwen3_5 (ragged decode SDPA,
target-verify GEMV/QMV) are unreachable during training: they require a
left-padded KV cache or target_verify=True (gdn_sink), neither of which
occurs in the trainer's forward.

Fixes the second crash in unslothai/unsloth#6002; item 1 overlaps PR #721.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* Drop redundant attention wrapper hunk and tighten comments for PR #738

* Handle legacy mlx-vlm gated_delta import layout for PR #738

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Lyxot added a commit to Lyxot/unsloth-zoo that referenced this pull request Jun 12, 2026
A newer mlx-lm adding parameters (precedent: the qwen3_5 attention
kwargs that needed unslothai#721) would previously TypeError inside the patched
wrapper. Silently dropping them could change training semantics, so
unknown kwargs now delegate to the original implementation with a
one-time warning naming the new arguments — old mlx-lm is unchanged,
newer mlx-lm stays correct until the patch is updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants