fix(mlx): gate gemma4 kv sharing shim#722
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR refines the Gemma4 KV-sharing training shim by detecting when the underlying mlx-vlm backbone already supports shared K/V natively, and skipping the monkey-patch in that case.
Changes:
- Added
_gemma4_has_native_shared_kv()feature-detection helper for native shared K/V support. - Updated
_fix_gemma4_kv_sharing()to early-return when native support is detected. - Clarified docstring to describe legacy-only behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| call_params = inspect.signature(backbone.__class__.__call__).parameters | ||
| except (TypeError, ValueError): | ||
| return False | ||
|
|
||
| return "shared_kv_sink" in call_params |
There was a problem hiding this comment.
Code Review
This pull request introduces a check (_gemma4_has_native_shared_kv) to detect if mlx-vlm natively supports Gemma4 shared K/V training, skipping the legacy monkey-patch when native support is present. The review feedback suggests making the signature inspection more robust by inspecting the backbone instance directly instead of backbone.__class__.__call__ and catching AttributeError to prevent potential crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| try: | ||
| call_params = inspect.signature(backbone.__class__.__call__).parameters | ||
| except (TypeError, ValueError): | ||
| return False |
There was a problem hiding this comment.
To make the signature inspection more robust and defensive, consider inspecting the backbone instance directly rather than backbone.__class__.__call__. This correctly handles any instance-level monkey-patches or overrides. Additionally, we should catch AttributeError in the except block to prevent potential crashes if backbone or its callable attributes are missing or mocked.
| try: | |
| call_params = inspect.signature(backbone.__class__.__call__).parameters | |
| except (TypeError, ValueError): | |
| return False | |
| try: | |
| call_params = inspect.signature(backbone).parameters | |
| except (TypeError, ValueError, AttributeError): | |
| return False |
…-gate # Conflicts: # unsloth_zoo/mlx/loader.py
|
Validated this before it merged; posting the record. The gate is a capability check rather than a version heuristic, which is the right design: |
This PR fixes a stale MLX Gemma4 training workaround around KV sharing.
Bug:
mlx-vlm < 0.5.0needed Unsloth’s_TrainingKVStoreshim because Gemma4 shared-KV layers still routed through cache objects during training.mlx-vlm >= 0.5.0added native Gemma4shared_kvthreading, so applying the old shim can interfere with the upstream path.Root cause:
_fix_gemma4_kv_sharing()for Gemma4 VLM loads and LoRA setup.mlx-vlm, but stale once upstream introduced native shared-KV coordination inmlx-vlmAdd KV cache quantization for continuous batching Blaizzy/mlx-vlm#1030 /v0.5.0.Fix:
_TrainingKVStoreshim when nativeshared_kvsupport is present.mlx-vlmversions such as0.4.4.