Skip to content

fix(mlx): gate gemma4 kv sharing shim#722

Merged
danielhanchen merged 2 commits into
unslothai:mainfrom
Lyxot:fix/gemma4-kv-sharing-gate
Jun 10, 2026
Merged

fix(mlx): gate gemma4 kv sharing shim#722
danielhanchen merged 2 commits into
unslothai:mainfrom
Lyxot:fix/gemma4-kv-sharing-gate

Conversation

@Lyxot

@Lyxot Lyxot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

This PR fixes a stale MLX Gemma4 training workaround around KV sharing.

Bug:

  • mlx-vlm < 0.5.0 needed Unsloth’s _TrainingKVStore shim because Gemma4 shared-KV layers still routed through cache objects during training.
  • mlx-vlm >= 0.5.0 added native Gemma4 shared_kv threading, so applying the old shim can interfere with the upstream path.

Root cause:

Fix:

  • Detect native Gemma4 shared-KV support from the loaded backbone.
  • Skip the legacy _TrainingKVStore shim when native shared_kv support is present.
  • Keep the shim active for older mlx-vlm versions such as 0.4.4.

Copilot AI review requested due to automatic review settings June 5, 2026 02:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR refines the Gemma4 KV-sharing training shim by detecting when the underlying mlx-vlm backbone already supports shared K/V natively, and skipping the monkey-patch in that case.

Changes:

  • Added _gemma4_has_native_shared_kv() feature-detection helper for native shared K/V support.
  • Updated _fix_gemma4_kv_sharing() to early-return when native support is detected.
  • Clarified docstring to describe legacy-only behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines +451 to +456
try:
call_params = inspect.signature(backbone.__class__.__call__).parameters
except (TypeError, ValueError):
return False

return "shared_kv_sink" in call_params

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a check (_gemma4_has_native_shared_kv) to detect if mlx-vlm natively supports Gemma4 shared K/V training, skipping the legacy monkey-patch when native support is present. The review feedback suggests making the signature inspection more robust by inspecting the backbone instance directly instead of backbone.__class__.__call__ and catching AttributeError to prevent potential crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines +451 to +454
try:
call_params = inspect.signature(backbone.__class__.__call__).parameters
except (TypeError, ValueError):
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make the signature inspection more robust and defensive, consider inspecting the backbone instance directly rather than backbone.__class__.__call__. This correctly handles any instance-level monkey-patches or overrides. Additionally, we should catch AttributeError in the except block to prevent potential crashes if backbone or its callable attributes are missing or mocked.

Suggested change
try:
call_params = inspect.signature(backbone.__class__.__call__).parameters
except (TypeError, ValueError):
return False
try:
call_params = inspect.signature(backbone).parameters
except (TypeError, ValueError, AttributeError):
return False

…-gate

# Conflicts:
#	unsloth_zoo/mlx/loader.py
@danielhanchen danielhanchen merged commit f219991 into unslothai:main Jun 10, 2026
11 of 12 checks passed
@danielhanchen

Copy link
Copy Markdown
Member

Validated this before it merged; posting the record. The gate is a capability check rather than a version heuristic, which is the right design: _gemma4_has_native_shared_kv keys off structural signals (backbone.previous_kvs with length matching layers, plus shared_kv_sink in the backbone call signature), and tracing mlx-vlm 0.6.2's gemma4 confirms the native path threads shared K/V independently of cache, so the legacy _TrainingKVStore shim is redundant exactly when the gate says so. Neither failure direction reproduces: it does not skip the shim where still needed, and does not apply it where native handling exists. Absent-attribute tolerance is clean (older mlx-vlm, missing attrs, and signature introspection failures all early-return), and _kv_sharing_patched keeps it idempotent. 16/16 local gate tests pass against the real mlx-vlm 0.6.2 Gemma4TextModel plus stubs for the legacy branches, and the Apple Silicon install plus MLX import smoke passed on this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants