Skip to content

Fix MLX save and GGUF export parity issues#697

Merged
danielhanchen merged 19 commits into
unslothai:mainfrom
Lyxot:fix/mlx-save-gguf-export-parity
Jun 10, 2026
Merged

Fix MLX save and GGUF export parity issues#697
danielhanchen merged 19 commits into
unslothai:mainfrom
Lyxot:fix/mlx-save-gguf-export-parity

Conversation

@Lyxot

@Lyxot Lyxot commented May 26, 2026

Copy link
Copy Markdown
Contributor

This PR fixes several issues around MLX merged saves and GGUF export parity.

Bug list:

  1. VLM merged/full saves were missing VLM config fidelity
    Root cause: MLX merged saves used the text-only mlx_lm.utils.save_config path for VLM checkpoints, so VLM-specific config structure, normalization, and fields were not preserved in config.json.
    Commit: 615aff8

  2. QLoRA merged 16-bit saves were not fully dequantized
    Root cause: The merged 16-bit path treated LoRA fusion with fuse(dequantize=True) as sufficient dequantization, but quantized non-LoRA modules could remain packed.
    Commit: 55fa173

  3. VLM GGUF mmproj output was metadata-only / missing real projector tensors
    Root cause: The temporary GGUF staging directory contained mlx-vlm sanitized tensor names/layouts, while llama.cpp MMPROJ conversion expects HF/llama.cpp projector names/layouts; in-place rewriting also risked corrupting file-backed tensors.
    Commit: 6fcaa19

  4. MLX GGUF export options like first_conversion were dropped by the bound save method
    Root cause: The monkey-patched MLX save_pretrained_gguf wrapper accepted CUDA-compatible **kwargs but discarded them when calling the utility implementation, so caller-specified GGUF options like first_conversion never reached export.
    Commit: 0029c54

  5. VLM GGUF exports could leave same-name vision tensors in MLX layout, e.g. Gemma3 patch embeddings stayed OHWI instead of OIHW
    Root cause: VLM sanitizer inversion only considered renamed candidates; for models whose sanitizer leaves a tensor name unchanged but changes layout, replay could accept the unchanged name/tensor and stop before applying the needed layout transform.
    Commit: 97c980f

  6. VLM GGUF sanitizer replay failed for models whose sanitizers need real submodules, e.g. GLM-OCR
    Root cause: Sanitizer replay used a fake proxy object with only config/args, but model sanitizers such as GLM-OCR call submodule sanitizers through real attributes like self.vision_tower, so replay failed and no tensor rewrite happened.
    Commit: c7002e7

  7. VLM GGUF export could fail with modern package-layout llama.cpp converters due missing conversion/ sibling package
    Root cause: Modern llama.cpp conversion scripts are package entrypoints that import sibling conversion.* modules; writing/running the patched script from a detached cache directory broke that package layout, and the MLX path could also use a different llama.cpp tree than the shared patcher root.
    Commit: 4546225

  8. MLX VLM loading could degrade to tokenizer-only processors, so GGUF staging omitted image processor metadata needed by mmproj conversion
    Root cause: GLM-OCR custom processor construction could fail through Transformers AutoImageProcessor without torchvision, and mlx-vlm fell back to tokenizer-only metadata.
    Commit: 398cb9c

  9. VLM GGUF staging could omit image processor sidecars like preprocessor_config.json, breaking mmproj conversion
    Root cause: MLX merged saves trusted tokenizer/model saves to emit every non-weight sidecar needed by downstream converters, but degraded or tokenizer-only processors can omit source files such as preprocessor_config.json, processor_config.json, video_preprocessor_config.json, tokenizer model files, or custom processing files.
    Commit: df5deb9

  10. GLM-OCR GGUF exports could advertise a NextN/MTP layer that the MLX model did not export, making llama.cpp load fail on missing blk.16.* tensors
    Root cause: Source config metadata could advertise speculative/NextN/MTP layers through fields such as num_nextn_predict_layers, nextn_predict_layers, or mtp_num_hidden_layers, but the loaded/exported MLX language model did not contain those extra blocks.
    Commit: d6d5d39

  11. Raw mlx-vlm model saves could miss config.json because model.config dataclass configs were not extracted
    Root cause: _get_model_config only handled _config dictionaries and model.args, while raw mlx-vlm models can expose config as a dataclass-like model.config.
    Commit: 962dec8

Validation:

  • Ran python -m pytest tests/test_mlx_save_export_regressions.py -q.
  • Result: 23 passed.

Copilot AI review requested due to automatic review settings May 26, 2026 16:07
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive regression tests and fixes for MLX save and GGUF export parity, including VLM processor repairs, tied embedding materialization, and VLM weight sanitization. The review feedback suggests improving robustness by safely checking if thinker_config is a dictionary before accessing its attributes to prevent potential AttributeErrors, and expanding the saved weights check to support single model.safetensors files in addition to sharded indexes.

Comment thread unsloth_zoo/mlx/utils.py
Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment thread unsloth_zoo/mlx/utils.py

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves MLX model save/export parity (especially for VLMs) to better match the CUDA path, including GGUF export behavior and preserved sidecar metadata, and adds regression tests for these scenarios.

Changes:

  • Enhance config extraction/saving for MLX-LM vs MLX-VLM, including dataclass handling and backend-aware save_config.
  • Add export-time fixes for GGUF/VLM parity (tensor rewrites, tied lm_head materialization, and llama.cpp path anchoring).
  • Repair degraded MLX-VLM processors by rebuilding image processor + processor class from sidecar configs; add a dedicated regression test suite.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
unsloth_zoo/mlx/utils.py Adds backend-aware config saving, tied-embedding materialization, VLM GGUF rewrite pipeline, sidecar copying, and GGUF export adjustments.
unsloth_zoo/mlx/loader.py Repairs degraded VLM processors from sidecar configs and filters supported kwargs for GGUF save/push bindings.
unsloth_zoo/llama_cpp.py Ensures patched converter scripts are written in the correct directory for package layouts.
tests/test_mlx_save_export_regressions.py Adds regression tests covering the new MLX save/export parity behaviors without heavy model downloads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment thread unsloth_zoo/mlx/utils.py
Comment thread unsloth_zoo/mlx/loader.py
Lyxot added 14 commits May 27, 2026 15:26
Prefer HF vision aliases when inverting mlx-vlm sanitizers, while keeping same-name tensor rewrites as a fallback. This preserves Gemma3 patch embedding layout fixes and avoids stopping early on Qwen-family MLX vision_tower names.
Use loaded VLM model instances when replaying mlx-vlm sanitizers for GGUF tensor rewrites, while keeping the config-derived class pipeline as a fallback. This covers models whose top-level sanitizer delegates through submodules, such as GLM-OCR.
@Lyxot Lyxot force-pushed the fix/mlx-save-gguf-export-parity branch from 444b88b to 0336fc3 Compare May 27, 2026 07:47
@Lyxot Lyxot requested a review from Copilot May 27, 2026 08:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread unsloth_zoo/mlx/utils.py
Comment thread unsloth_zoo/mlx/utils.py
Comment thread unsloth_zoo/mlx/utils.py
Comment thread unsloth_zoo/mlx/utils.py
Lyxot added 2 commits May 27, 2026 16:48
This reverts commit d8cdf89.

Also removes the related regression coverage and later helper hardening now that bug-4 is out of scope.
@danielhanchen

Copy link
Copy Markdown
Member

Triple-confirmed this PR end-to-end:

1. Local Linux, four Python versions

47 tests pass on each of 3.10 / 3.11 / 3.12 / 3.13: the 23 regressions from tests/test_mlx_save_export_regressions.py plus 24 additional edge-case probes I wrote to cover backward-compat negatives and the Copilot review comments (_copy_source_sidecars with non-directory src, _has_vision_config with malformed thinker_config, _read_json_file with binary garbage and PermissionError, _rewrite_mlx_vlm_tensor_for_gguf empty pipeline, _save_mlx_config text-only routing, _is_vlm_model on primitives, etc.). All green on every interpreter.

2. Cross-OS shim path

tests/test_mlx_save_export_regressions.py + tests/test_mlx_torch_shim_smoke.py green on macos-14 (Apple Silicon), ubuntu-latest, and windows-latest CI runners. The shim path catches Linux / Windows pathlib regressions that real Apple Silicon hardware would not surface.

3. Real Apple Silicon mlx wheels plus Studio boot

Drove bash install.sh --local --no-torch, force-reinstalled this PR's unsloth-zoo head into the Studio venv, installed real mlx 0.x / mlx-lm / mlx-vlm 0.5.0 wheels, and ran 18 contract probes against the live Apple Silicon kernels (one per fix). All 18 PASS. Then booted unsloth studio (API-only) with the PR's unsloth-zoo overlaid: /api/health returned {\"status\":\"healthy\", ...} in 23 seconds. No Studio boot regression.

Pre-PR vs post-PR diff

Copied this PR's test file onto unslothai/unsloth-zoo:main and re-ran it: 22 failed, 1 passed. Same suite on the PR head: 23 passed. The single pre-PR passer is test_lora_push_uses_lora_adapter_hub_path, a defensive guard the author added to confirm the unchanged LoRA-only push path stays intact -- it correctly passes on both sides. Every other test detects a real, observable failure on main. Sample failure on main: TypeError: push_to_hub_gguf() got an unexpected keyword argument 'first_conversion', which is exactly the bug fix #4 addresses.

Coverage per fix

# Commit Confirmed
1 615aff8e VLM config save uses mlx_vlm.utils.save_config; quantization_config preserved. Text-only still routes through mlx_lm.utils.save_config (backward-compat negative)
2 55fa1737 QLoRA merged 16-bit fully dequantizes non-LoRA quantized modules
3 6fcaa19a VLM GGUF mmproj tensor names/layouts normalized
4 0029c545 save_pretrained_gguf / push_to_hub_gguf accept and forward first_conversion (was the smoking gun on main)
5 97c980fc Same-name layout transforms applied; _mlx_arrays_match value-checks real rank-2 mlx arrays
6 c7002e78 _MlxVlmSanitizeProxy constructable; replay uses real submodules
7 4546225d Patched converter co-located with conversion/ sibling
8 398cb9ca _repair_degraded_vlm_processor rebuilds from sidecars; _read_json_file returns {} for missing / binary; propagates unexpected errors
9 df5deb90 _copy_source_sidecars copies non-weight sidecars, skips weights, handles non-directory src cleanly
10 d6d5d39b NextN / MTP metadata stripped when MLX language model doesn't export those layers
11 962dec85 _get_model_config extracts dataclass-shaped model.config

Notes for the maintainer

  • The branch is currently based on 606d2b2d, which is one commit behind main's bea8b483 tool mask support (#688). A two-way git diff main..HEAD renders unsloth_zoo/rl_replacements.py as removed; a real three-way merge preserves those additions because this branch never touches the file. Worth a rebase before merge to make the diff easier to read.
  • Unrelated to this PR: tests/test_mlx_finetune_last_n_layers.py::test_get_peft_model_passes_finetune_last_n_layers_through fails identically on main and on this branch with AttributeError: 'FakeModel' object has no attribute 'trainable_parameters'. Separate cleanup.
  • Also unrelated to this PR: .github/workflows/mlx-ci.yml on main still imports the pre-subpackage flat paths (unsloth_zoo.mlx_loader etc.). It will fail the moment this PR's subpackage layout becomes the default. Should be refreshed to unsloth_zoo.mlx.{loader,compile,utils,trainer,cce}.
  • Copilot-flagged minor concerns the PR has not yet addressed (none of these block merge): _mlx_arrays_match value-blind on ranks 3 and 6 plus; os.environ mutation in _download_convert_hf_to_gguf_cached is process-global / not thread-safe; isinstance(value, mx.array) -- works fine on the shim and on real mlx 0.x, just flagging for future-proofing.

LGTM.

…nslothai#697

- unsloth_zoo/mlx/loader.py: _read_json_file returned lists/strings/None for
  sidecars holding valid but non-object JSON, which crashed
  _repair_degraded_vlm_processor with AttributeError on .get(). Coerce
  non-dict payloads to {} to match the documented contract.
- tests/test_mlx_save_export_edge_cases.py: 71 tests covering branches the
  regression suite does not reach: rank-3/5D/1D tensor matching, the
  _MlxVlmSanitizeProxy two-arg sanitize path, the real shard rewrite loop
  with safetensors files plus index.json remapping, duplicate-name guards,
  NextN decrement and skip branches, sidecar filesystem edges (unicode,
  symlinks, permissions, src equals dst), six-thread concurrent GGUF
  exports over the patcher env lock, a user-set
  UNSLOTH_LLAMA_CPP_SCRIPTS_DIR override, and monolith vs package
  converter placement.
- tests/mlx_simulation/mlx_stub.py: save_safetensors now normalizes tensors
  to contiguous; real MLX has no contiguity constraint, so the shim should
  not reject the rewrite loop's transposed views.

All 273 MLX tests pass on Linux via the simulation shim across Python
3.10-3.13 and transformers 4.51.3, 4.57.6 and 5.5.0. The only failure is
the pre-existing FakeModel one in test_mlx_finetune_last_n_layers.py,
tracked in unslothai#700.
@danielhanchen

Copy link
Copy Markdown
Member

Pushed 5a6ee091 onto this branch with one small fix and a larger edge case test suite I wrote while stress testing the PR.

Bug fix: _read_json_file returned non-dict values

A sidecar holding valid JSON that is not an object ([], "text", null, 3.14) was returned as is, and _repair_degraded_vlm_processor then crashed on processor_config.get("processor_class") with AttributeError: 'list' object has no attribute 'get'. So a malformed processor_config.json could break VLM loading inside the very function meant to repair degraded processors. The fix coerces non-dict payloads to {}, matching the documented contract and the existing handling for missing or unparseable files.

New tests: tests/test_mlx_save_export_edge_cases.py (71 tests)

Named with the test_mlx_ prefix so the MLX CI glob picks it up. It covers the branches the 23 regression tests do not reach:

  • _mlx_arrays_match on rank 1/3/6 tensors, scalars, identity shortcut, backend errors, NaN. This also settles the review concern about value-blind matching on rank 3 plus: current code value checks all ranks.
  • _rewrite_mlx_vlm_tensor_for_gguf: 5D layout transforms, the 1D floating tensor - 1 candidate, multi-pipeline inputs, raising sanitizers, model. prefixed alias families.
  • _MlxVlmSanitizeProxy via a two-argument sanitize(self, weights) class method (previously zero coverage on that path).
  • _prepare_vlm_gguf_export_directory end to end with real safetensors shards: rewrite plus atomic replace, index.json weight_map remapping, duplicate-name RuntimeError, missing index, unicode shard names, and all NextN decrement/skip branches including language_config and thinker_config.text_config.
  • _copy_source_sidecars: src equals dst, symlinks, suffixless and uppercase-suffix files, unicode names, spaces in paths, permission errors.
  • _is_vlm_model, _get_model_config, _save_mlx_config real bodies (explicit flags, dataclass and to_dict fallbacks, quantization key precedence).
  • save_pretrained_gguf: default first_conversion derivation through the bf16 intermediate plus quantize step and intermediate cleanup; six threads exporting concurrently to prove the patcher env lock serializes and restores UNSLOTH_LLAMA_CPP_SCRIPTS_DIR, including a preexisting user override.
  • Converter placement contract: package layout patched script lands beside conversion/, monolith layout stays in LLAMA_CPP_DEFAULT_DIR (backward-compat guard for the llama_cpp.py change).

One shim adjustment was needed: mx.save_safetensors in tests/mlx_simulation now normalizes tensors to contiguous, since real MLX has no contiguity constraint but safetensors.torch rejects non-contiguous views. Without it the real shard rewrite loop was untestable on the shim.

Validation

  • This branch: pytest tests/test_mlx_*.py gives 273 passed, 1 skipped. The single failure is the preexisting test_mlx_finetune_last_n_layers.py FakeModel issue, identical on main, fix already open in Fix mlx-ci.yml torch install gap + finetune-last-n-layers FakeModel stubs #700.
  • Same result across isolated venvs on Python 3.10, 3.11, 3.12 and 3.13 with CPU torch 2.12, and transformers 4.51.3, 4.57.6 and 5.5.0.
  • Comment-only polish was verified by AST comparison (docstrings stripped) against the tested tree, so the published code is exactly what ran.

No behavior changes outside the one loader fix. The branch still merges clean onto current main.

Comment and docstring only pass over the PR diff, applying the repo brevity
policy: collapse multi line blocks, drop comments that restate the code, and
keep load bearing rationale. Scope is restricted to lines this PR adds so the
pass cannot conflict with the comment tightening already merged on main.

- unsloth_zoo/llama_cpp.py: 3 line placement comment down to 2.
- unsloth_zoo/mlx/utils.py: file backed safetensors gotcha down to 2 lines,
  dropped a comment restating _copy_source_sidecars.
- unsloth_zoo/mlx/loader.py: _repair_degraded_vlm_processor docstring 8 to 5
  lines keeping the why.
- tests/test_mlx_save_export_regressions.py: module docstring 5 to 3 lines.
- tests/test_mlx_save_export_edge_cases.py: 14 three line section banners
  collapsed to one line each, two line docstrings to one, removed docstrings
  and comments that restate test names or adjacent assertions.

Net 45 lines removed. Every file verified code identical to the previous
commit by AST comparison with docstrings stripped; comment_tools.py check
passes for all files except the one intentional docstring deletion, which
the AST comparison covers. Full MLX suite: 273 passed, 1 skipped, plus the
known pre existing failure tracked in unslothai#700.
@danielhanchen

Copy link
Copy Markdown
Member

Pushed 50309fe7, a comment and docstring only pass over this PR's diff: collapsed multi line blocks, removed comments that restate the code, kept the load bearing rationale (the file backed safetensors gotcha, converter placement reasoning, processor repair context). Net 45 lines removed.

Scope is restricted to lines this PR adds, so it cannot conflict with the comment tightening already merged on main. Every touched file was verified code identical to the previous commit by AST comparison with docstrings stripped, and the full MLX suite still passes (273 passed, 1 skipped, plus the known pre existing failure tracked in #700).

@danielhanchen

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50309fe78c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment on lines 4723 to 4724
first_conversion=None,
token=None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move first_conversion after existing upload args

Adding first_conversion before token changes the existing positional API: a caller using the previous form push_to_hub_gguf(model, tok, dir, repo, "q8_0", token) now binds the token string as first_conversion and leaves token=None. In that scenario the export attempts to use the HF token as the intermediate GGUF dtype and the Hub client is created unauthenticated, so existing positional callers fail or upload without credentials; append the new option after private or make it keyword-only to preserve compatibility.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid catch. Fixed in 350d790: first_conversion now sits after token and private, so the pre-existing positional form push_to_hub_gguf(model, tok, dir, repo, method, token) keeps binding token as token. Guarded by test_push_to_hub_gguf_positional_token_stays_token.

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines +379 to +383
module_model_type = (model_type or "").replace("-", "_")
module_candidates = (
f"mlx_vlm.models.{module_model_type}.processing",
f"mlx_vlm.models.{module_model_type}.processing_{module_model_type}",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve remapped mlx-vlm processor modules

When the saved model_type is one of mlx-vlm's remapped aliases (the loader already accounts for MODEL_REMAPPING in _build_vlm_model_types), this only probes the raw name with hyphens replaced. In that case a degraded tokenizer-only processor cannot be repaired because the custom processor class lives under the remapped mlx-vlm package, so subsequent saves still miss the image processor metadata; include the remapped target module name in the candidates before falling back to Transformers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. 350d790 adds the MODEL_REMAPPING target package to the module candidates, so aliased model types resolve their custom processor classes before falling back to Transformers. Covered by test_resolve_processor_class_follows_model_remapping.

Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment on lines +3548 to +3549
if model is None or getattr(model, "sanitize", None) is None:
return []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow submodule-only VLM sanitizers

For VLM wrappers whose top-level model object does not define sanitize but a component such as vision_tower or thinker.vision_tower does, this early return prevents the submodule scan below from ever running. _prepare_vlm_gguf_export_directory then sees no sanitizer pipelines and skips the HF-layout rewrite, leaving those GGUF exports with MLX-native tensor names/layouts; build pipelines from available submodule sanitizers even when the wrapper itself has no sanitizer.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. 350d790 makes the wrapper step optional: submodule sanitizers such as vision_tower.sanitize now build replay pipelines even when the wrapper has no sanitize of its own, so those exports get the HF-layout rewrite. Covered by test_sanitize_pipelines_built_from_submodule_only_sanitizers plus a guard that the wrapper-first ordering is unchanged when the wrapper does sanitize.

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines +416 to +427
if image_processor_type:
try:
import transformers
image_processor_class = getattr(transformers, image_processor_type, None)
if image_processor_class is not None:
return image_processor_class(**image_kwargs)
except Exception:
pass

try:
from transformers import AutoImageProcessor
return AutoImageProcessor.from_pretrained(model_path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Try mlx-vlm image processors before giving up

When the degraded processor came from a model whose image processor is implemented by mlx-vlm rather than transformers, this path cannot rebuild it: it only looks up image_processor_type on the transformers module and then retries AutoImageProcessor.from_pretrained, which is the failure mode this repair is meant to recover from. For those models the function returns None, so _repair_degraded_vlm_processor leaves a tokenizer-only processor and saved exports still miss multimodal processor metadata; use the mlx-vlm model module's ImageProcessor/load_image_processor path as a candidate here.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. 350d790 tries the mlx-vlm processing module for the named image processor class after the Transformers lookup and before the AutoImageProcessor retry, so mlx-vlm native image processors can be rebuilt from sidecar kwargs. Covered by test_image_processor_rebuild_uses_mlx_vlm_module_class.

All four review comments were valid; each is fixed with a regression test
in tests/test_mlx_save_export_edge_cases.py:

- push_to_hub_gguf: first_conversion moved after token and private so
  pre-existing positional callers keep their meaning; it was inserted
  before token and would have bound a positionally passed HF token as the
  intermediate GGUF dtype.
- _get_mlx_vlm_model_sanitize_pipelines: wrappers without a top level
  sanitize no longer return zero pipelines; submodule sanitizers such as
  vision_tower.sanitize now produce replay pipelines on their own.
- _resolve_mlx_vlm_processor_class: probes the MODEL_REMAPPING target
  package as well, so aliased model types resolve their custom processor
  classes.
- _build_vlm_image_processor_from_config: falls back to the mlx-vlm
  processing module for image processor classes that do not exist in
  transformers, before the AutoImageProcessor retry.

Full MLX suite: 278 passed, 1 skipped. Full repo suite: 1114 passed; the
two failures are pre-existing on the branch (FakeModel stub tracked in
unslothai#700, and the PyPI version sync test which lags because this branch
predates the latest release and resolves on merge).
@danielhanchen danielhanchen merged commit b2456db into unslothai:main Jun 10, 2026
14 checks passed
danielhanchen added a commit that referenced this pull request Jun 10, 2026
…tubs (#700)

* Fix mlx-ci.yml torch install gap + finetune-last-n-layers FakeModel stubs

Two unrelated cleanups surfaced while validating PR #697:

1. .github/workflows/mlx-ci.yml fails on every scheduled run.

   pyproject gates `torch` off on darwin+arm64 (mlx is the native path
   on Apple Silicon), so `pip install -e .[mlx]` on macos-14 does not
   pull torch. But tests/mlx_simulation/__init__.py:52 unconditionally
   `import torch` so the autouse shim that powers the test suite can
   monkeypatch torch.Tensor with MLX-only methods. Collection of
   tests/test_mlx_torch_shim_smoke.py blows up with
   `ModuleNotFoundError: No module named 'torch'` before any test runs.

   Reproduces: most recent scheduled cron run on main (run id
   26499388502) failed in 52s with exactly that error at collection
   time. Fix: install the CPU torch wheel explicitly before `.[mlx]`,
   matching the local-dev install order. Three-line patch.

2. tests/test_mlx_finetune_last_n_layers.py FakeModel is incomplete.

   `unsloth_zoo/mlx/loader.py::FastMLXModel.get_peft_model` prints a
   "LoRA applied -- N trainable params (P% of T total)" stats line at
   the tail of the function via
   `mlx.utils.tree_flatten(model.trainable_parameters())` plus
   `model.parameters()`. The synthetic FakeModel in this test never
   implemented those two methods, so the assertion targeting
   `linear_to_lora_layers`'s captured `num_layers` is never reached --
   the test crashes at `loader.py:3712` with
   `AttributeError: 'FakeModel' object has no attribute
   'trainable_parameters'`.

   The test exercises a parameter passthrough; the stats line is
   incidental. Fix: stub `trainable_parameters` and `parameters` on
   FakeModel to return empty dicts so tree_flatten yields zero entries
   and the stats line prints "0.00% of 0 total" without raising.

Verified locally:
  python -m pytest tests/test_mlx_finetune_last_n_layers.py -v
    2 passed
  python -m pytest tests/test_mlx_*.py -q
    180 passed, 1 skipped
  python -m pytest tests/ -q --ignore=tests/security
    1017 passed, 50 skipped

* Match pyproject torch upper bound (<2.13.0, not <2.11.0)

pyproject.toml line 26 declares
  torch>=2.4.0,<2.13.0 ; (sys_platform != 'darwin' or platform_machine != 'arm64')

The mlx-ci.yml install used <2.11.0 by mistake. Bring the workflow into
sync so it does not block torch 2.11 / 2.12 which the project itself
supports.

---------

Co-authored-by: Daniel Han-Chen <michaelhan2050@gmail.com>
@danielhanchen

Copy link
Copy Markdown
Member

Ran a post-merge verification on this since it landed mid-review: the parity fixes are artifact-level correct, not just call-level. Confirmed mlx_lm.utils.save_config pops vision_config (so routing VLMs through mlx_vlm.utils.save_config is the right fix, and the quantization_config mirroring matches), dequantize_model genuinely dequantizes standalone quantized non-LoRA modules and strips quant metadata, and the VLM GGUF shard rewrite keeps model.safetensors.index.json consistent, with the write-beside-then-replace pattern protecting mmap-backed tensors. 99/99 of the PR's tests pass on the Linux CPU shim, the sibling MLX suites (57 tests) still pass, and CUDA-path risk is nil (lazy imports; the one shared llama_cpp.py change defaults to byte-identical legacy behavior). Apple Silicon CI on main including this change is green: https://github.com/unslothai/unsloth-zoo/actions/runs/27282789929

One small follow-up worth a one-liner: the "already saved" check in push_to_hub_merged (unsloth_zoo/utils.py around line 4612) does not account for single-file model.safetensors, causing a redundant re-save. Non-blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants