Skip to content

fix(mlx): save only LoRA adapter tensors#692

Merged
danielhanchen merged 44 commits into
unslothai:mainfrom
Lyxot:fix-mlx-export-adapters
May 27, 2026
Merged

fix(mlx): save only LoRA adapter tensors#692
danielhanchen merged 44 commits into
unslothai:mainfrom
Lyxot:fix-mlx-export-adapters

Conversation

@Lyxot

@Lyxot Lyxot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Fix MLX LoRA adapter export to match the CUDA/PEFT behavior by saving only adapter tensors instead of all trainable tensors.

MLX save_lora_adapters() now collects tensors from the full model parameter tree and filters by LoRA adapter tensor names using the same principle as PEFT's CUDA path: keep keys containing lora_.

This prevents Studio adapter exports from including base model tensors after a checkpoint reload where some base tensors may be marked trainable.

Changes

  • Add a dedicated MLX LoRA adapter tensor-name filter.
  • Save only tensors whose names contain lora_.
  • Keep training checkpoint saves separate by preserving save_trainable_adapters().
  • Continue writing the same adapter artifacts: adapters.safetensors and adapter_config.json.

The lora_ filter logic comes from PEFT's get_peft_model_state_dict(), which is what the CUDA path uses through model.save_pretrained():

https://github.com/huggingface/peft/blob/a106ff4c7061dd9e59609f88724e4770c3b37293/src/peft/utils/save_and_load.py#L133-L153

Copilot AI review requested due to automatic review settings May 22, 2026 19:57
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR refactors MLX adapter checkpoint saving by introducing a shared artifact-saving helper and switching training checkpoints to save the model’s full trainable parameter set (not just LoRA tensors).

Changes:

  • Added _save_adapter_artifacts and save_trainable_adapters utilities to persist trainable adapter weights plus enriched metadata.
  • Updated save_lora_adapters to save only LoRA-named tensors and to error when none are found.
  • Switched trainer.py checkpointing from save_lora_adapters to save_trainable_adapters.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
unsloth_zoo/mlx/utils.py Adds shared save helper + new trainable-adapter save entry point; tightens LoRA saving to LoRA-only tensors.
unsloth_zoo/mlx/trainer.py Updates checkpointing to save all trainable parameters rather than only LoRA tensors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment on lines +2571 to +2576
parameters = dict(mlx.utils.tree_flatten(model.parameters()))
adapter_tensors = {
name: value
for name, value in parameters.items()
if "lora_" in name.lower()
}
Comment thread unsloth_zoo/mlx/utils.py Outdated
Comment thread unsloth_zoo/mlx/utils.py Outdated
danielhanchen added a commit to danielhanchen/unsloth-zoo-staging that referenced this pull request May 24, 2026
The seven upstream workflows (consolidated-tests-ci, lint-ci, mlx-ci,
security-audit, stale, studio-export-fix-ci, wheel-smoke) would fire on
every push and PR-event to this throwaway staging branch and burn runner
minutes that have nothing to do with validating MLX PRs unslothai#679, unslothai#682, unslothai#692.

Keep only the three mlx-pr-* workflows on this branch. They stay in
upstream main / origin/main untouched -- this deletion is scoped to the
staging branch only.
… for PR unslothai#692

Three follow-ups from review feedback:

1. save_lora_adapters now filters via a new
   collect_mlx_lora_adapter_tensors() helper that walks named_modules()
   and keeps only tensors belonging to modules that actually expose
   lora_a / lora_b. The previous "lora_" substring filter falsely
   included any path containing the prefix, e.g. router.lora_gate.weight
   or scalar_lora_alpha. Tolerates lora_A / lora_B casing too.

2. MLXTrainer.save_model() now uses the same helper to decide whether
   to call save_lora_adapters. The previous trainable_parameters() scan
   missed adapter tensors after a reload/freeze (LoRA lives in
   parameters() but is not always marked trainable), which caused
   final export to fall through to save_merged_model() instead of
   writing an adapter file.

3. tests/test_mlx_save_lora_adapters_filter.py adds five regressions
   over the split-save semantics: lora-only filter, brittleness
   demonstration (lora_router), no-LoRA ValueError, trainable
   checkpoint preserves everything, post-reload adapter detection.
   Also widens the ValueError message to point users at
   save_trainable_adapters() for non-LoRA checkpoint use.

adapter_config.json write also now pins encoding="utf-8" for
cross-platform parity.
_ensure_lora_frozen now drives LoRA detection from
collect_mlx_lora_adapter_tensors instead of an "lora" substring scan over
trainable_parameters(). The norm-freezing intent from blame commit
82d75e0 ("Improve compiled training loop, CCE memory optimizations, and
LoRA stability" - "Add _ensure_lora_frozen() to prevent NaN from
unfrozen LayerNorm in adaptive optimizers") is preserved verbatim; only
the LoRA presence predicate is swapped so that reloaded/frozen LoRA
models, whose adapter tensors live in parameters() but are not listed as
trainable, still trigger the safeguard. The old predicate silently
returned False in that state and let the LayerNorm NaN bug recur, which
is the exact failure mode the original commit was added to prevent.

save_pretrained_merged now uses the same collect_mlx_lora_adapter_tensors
predicate, so the outer LoRA-vs-merged gate and the inner
save_lora_adapters helper agree on what counts as a LoRA model and the
error path stays consistent.
_ensure_lora_frozen now requires at least one module-anchored LoRA tensor
to be in trainable_parameters() before freezing accidentally trainable
norms. The previous full-tree check would silently freeze a user's
trainable norm when a reloaded model still carried frozen LoRA tensors
in parameters() but the user was running a non-LoRA fine-tune. The
module-anchored predicate is kept (no more "lora" substring false
positives), only the trainability gate is restored, preserving the
blame-cited NaN-protection intent for active LoRA training without
disturbing the non-LoRA reload case.

save_pretrained_merged no longer collects LoRA adapter tensors when
save_method is merged_16bit or merged_4bit, since those branches do not
read has_lora. The call is hoisted into the lora branch so merged saves
skip the full-parameter walk.
Trim the four-line rationale block to the single load-bearing fact:
reloaded LoRA can sit in parameters() without trainable_parameters(),
so the old substring check fell through to save_merged_model().
Extend the existing regression module with five behavior-named
assertions:

- norm-freeze runs when LoRA is actively trained alongside an
  accidentally trainable norm
- norm-freeze skips when adapter tensors are reloaded but not
  trainable (the user is doing a non-LoRA fine-tune)
- norm-freeze skips for non-LoRA models
- save_pretrained_merged raises the user-facing "no LoRA layers"
  message at the gate when adapter tensors are absent
- save_pretrained_merged skips the LoRA collector for merged_16bit
  and merged_4bit save paths

Also drop the unused json + pathlib imports, the unused tmp_path
fixture in test_collect_lora_helper_finds_adapters_after_reload, and
tighten the module docstring.
Restore .github/workflows/* (these match upstream main and were only
scrubbed for the staging push), pull in the regression test additions
for the MLX adapter filter, and keep the source-side LoRA detection
fixes intact.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen danielhanchen added auto-approved Auto-review approved the PR and removed auto-reviewing Auto-review in progress labels May 24, 2026
@danielhanchen

Copy link
Copy Markdown
Member

Auto-review verdict: Approved

PR 692 splits MLX adapter saving so save_lora_adapters exports only module-anchored lora_a/lora_b tensors (preventing base-weight leakage and reload/freeze fall-through to save_merged_model) while save_trainable_adapters keeps the full trainable tree for in-loop checkpoints. With the review-added refinements to _ensure_lora_frozen and save_pretrained_merged, the change is correct, covered, and a clear improvement over the prior substring-based detection.

Reason: Real reload/freeze export bug fixed with module-anchored detection; review-loop iterations corrected the LoRA-frozen guard semantics and hoisted the merged save check, all backed by 10 passing tests.

… PR unslothai#692

- collect_mlx_lora_adapter_tensors() now requires a complete attribute pair
  (lora_a + lora_b, or lora_A + lora_B). Half-adapter modules no longer slip
  through into adapters.safetensors as unreloadable artifacts.
- New iter_mlx_lora_modules() helper feeds the collector, _enrich_mlx_adapter_config(),
  and MLXTrainer.save_model() so uppercase tensors get matching module paths,
  rank, scale, and dropout metadata instead of falling back to defaults.
- MLXTrainer.save_model() routes mixed LoRA + non-LoRA trainables to
  save_trainable_adapters() so intentionally trainable embeddings, projector,
  vision, or norm weights are not silently dropped from the final artifact.
  Frozen-LoRA + non-LoRA-trainable runs now correctly fall through to
  save_merged_model().
- LoRASwitchLinear rank reads shape[-2] for (num_experts, rank, in_dims)
  instead of writing in_dims into adapter_config["rank"].

Test cases:
- test_collect_lora_helper_accepts_uppercase_pair
- test_collect_lora_helper_drops_half_adapter_module
- test_iter_mlx_lora_modules_reports_attr_pair
- test_enrich_adapter_config_records_uppercase_lora_paths

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83d0d29291

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/utils.py
can detect LoRA after reload/freeze when trainable_parameters() no
longer lists adapter tensors.
"""
parameters = dict(mlx.utils.tree_flatten(model.parameters()))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard LoRA collection when model lacks full module API

collect_mlx_lora_adapter_tensors() now unconditionally calls model.parameters() (and later model.named_modules()), but MLXTrainer.__init__ invokes this path for every model via _ensure_lora_frozen(). Models/wrappers that expose only trainable_parameters() (or omit one of these MLX module methods) now fail at trainer construction with AttributeError instead of simply behaving as non-LoRA models, which is a regression from the previous substring-based check. Add a defensive fallback (e.g., return {} when required methods are missing) so non-LoRA or lightweight wrapper models still initialize.

Useful? React with 👍 / 👎.

- trainer.save_model now calls the unsloth_zoo save_lora_adapters utility
  directly, matching the calling convention used for save_trainable_adapters
  on the sibling branch and removing the dependence on the loader-side
  monkeypatch when the trainer is constructed without _patch_mlx_saving.
- _ensure_lora_frozen reuses one collect_mlx_lora_adapter_tensors pass and
  uses the resulting adapter key set for the norm safeguard so paths that
  merely contain "lora" (e.g. router.lora_gate.weight) no longer disable
  the freeze.
- save_trainable_adapters now raises when trainable_parameters() is empty
  so checkpoint directories never carry adapter_config.json without an
  adapters.safetensors next to it.
- Drop unused unpack names (_b_attr, _module) in iter_mlx_lora_modules
  loops.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…irror lora_parameters top-level for PR unslothai#692

Round-11 review caught three remaining symmetric-fix gaps in the
adapter save and Hub push paths:

* `update_repo_settings` failure used to print-and-continue, which means
  an existing public repo stays public when the caller asked for
  `private=True`. Refuse to upload in that case so adapters never get
  published to a public Hub URL the caller explicitly tried to make
  private. Public-by-request failures still print and continue (their
  intent is unchanged).
* `_enrich_mlx_adapter_config` previously preserved caller-provided
  `fine_tune_type` when the value was `"dora"` but the live model had
  no DoRA modules (or vice versa). mlx-lm would then rebuild DoRA
  wrappers and look for a `.m` magnitude tensor the saved
  adapters.safetensors does not contain, dropping every adapter via
  strict=False. Derive `fine_tune_type` strictly from the live model
  presence of `DoRA*` modules; the caller's stale value yields to
  ground truth.
* Top-level `rank` / `scale` / `dropout` were only backfilled when
  absent, so a caller-supplied stale `rank=99` could shadow the real
  `lora_parameters.rank=4`. mirror `lora_parameters` to the top level
  verbatim so the two shapes mlx-vlm reload checks always agree.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…nslothai#692

Round-13 Opus review caught that `push_to_hub_gguf` missed the same
visibility hardening applied to `_push_lora_adapters_to_hub` and
`push_to_hub_merged` in R11. `HfApi.create_repo(exist_ok=True)` is a
no-op for the visibility flag on existing repos, so
`model.push_to_hub_gguf(repo_id="me/existing-public-repo", private=True)`
would silently upload the GGUF shards (often multi-GB merged weights)
to a public Hub URL the caller explicitly tried to make private.

Apply the same pattern: paired `update_repo_settings(private=...)` +
hard-fail RuntimeError when `private=True` was requested and the
visibility update fails. Public-by-request failures continue to
print-and-continue.

This is the highest-impact of the three push paths because GGUF files
are typically the ones users distribute most widely, and they contain
full merged weights rather than adapters.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c27dce30e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/utils.py
Comment on lines +3173 to +3177
_caller_wants_commit_metadata = bool(
create_pr
or commit_message is not None
or commit_description is not None
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore default commit strings when gating LoRA fallback

_push_lora_adapters_to_hub() treats any non-None commit_message/commit_description as a hard metadata requirement, so callers that pass the default strings (for example wrappers that always forward "Trained with Unsloth") set _caller_wants_commit_metadata=True and will raise instead of falling back to upload_large_folder() when upload_folder() is unavailable or signature-mismatched. This makes LoRA hub uploads fail in older huggingface_hub environments even though no custom metadata was requested; push_to_hub_merged() already avoids this by comparing against default strings.

Useful? React with 👍 / 👎.

…RAEmbedding for PR unslothai#692

Round-15 review caught two regressions I introduced plus one
asymmetric-fix gap:

* All three create_repo sites (LoRA push, merged push, GGUF push) used
  `private=bool(private) if private is not None else False`, which
  silently forces a brand-new repo to `private=False` whenever the
  caller did not pass the kwarg. That overrides Hugging Face Hub
  organization-level default-private policies: a user inside a
  default-private org would get a public repo on first push. Build the
  create_repo kwargs conditionally so `private` is omitted unless the
  caller actually set it; the Hub then applies the account/org policy
  for initial visibility. The existing update_repo_settings + R11
  hard-fail-on-failure logic still enforces explicit `private=True`.
* `_LORA_WRAPPED_BASE_SUFFIXES` only covered `.linear.*` wrapped-base
  state. LoRAEmbedding and DoRAEmbedding wrap an inner nn.Embedding at
  `.embedding`, so non-root embedding adapters could leak
  `embed_tokens.embedding.weight` through `save_trainable_adapters`
  and `save_pretrained_merged(save_method="lora")`. Add the
  `.embedding.weight` / `.embedding.scales` / `.embedding.biases`
  suffixes alongside the existing `.linear.*` variants.
* `_ROOT_LORA_WRAPPED_BASE_KEYS` had the same gap for root-level
  LoRAEmbedding wrappers. Extend the frozenset to include
  `embedding.weight` / `embedding.scales` / `embedding.biases`.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fef7d541f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/utils.py
can detect LoRA after reload/freeze when trainable_parameters() no
longer lists adapter tensors.
"""
parameters = dict(mlx.utils.tree_flatten(model.parameters()))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard adapter collection against missing model APIs

Fresh evidence: in this revision MLXTrainer._ensure_lora_frozen now always calls collect_mlx_lora_adapter_tensors during trainer setup, so models/wrappers that expose trainable_parameters() but not full MLX module methods will now fail early. collect_mlx_lora_adapter_tensors unconditionally dereferences model.parameters() (and then model.named_modules()), which raises AttributeError on lightweight wrappers and turns non-LoRA initialization into a hard crash instead of a no-op LoRA detection path.

Useful? React with 👍 / 👎.

Comment thread unsloth_zoo/mlx/utils.py
Comment on lines +3186 to +3190
_caller_wants_commit_metadata = bool(
create_pr
or commit_message is not None
or commit_description is not None
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat default commit strings as non-custom in LoRA fallback

Fresh evidence: save_pretrained_merged(..., save_method="lora") forwards caller commit kwargs directly into _push_lora_adapters_to_hub, but _caller_wants_commit_metadata is currently true for any non-None commit string. That means passing the default text (for example from wrappers that always forward defaults) triggers the strict path and can raise instead of falling back to upload_large_folder on older huggingface_hub where upload_folder is unavailable, even though no custom metadata was requested.

Useful? React with 👍 / 👎.

…unslothai#692

Fixes two asymmetric-fix gaps from round 16 review:

1. unsloth_zoo/mlx/utils.py: `_LORA_WRAPPED_BASE_SUFFIXES` and
   `_ROOT_LORA_WRAPPED_BASE_KEYS` already filtered the wrapped base
   weight / scales / biases of an mlx-lm `LoRALinear`, but missed the
   inner `linear.bias` of a bias-bearing wrapped `nn.Linear`. Without
   it, `q_proj.linear.bias` and `lm_head.linear.bias` leaked into
   adapter saves and `_is_lm_head_trainable()` reported True for
   adapter-only training. A bare `.bias` is deliberately NOT in the
   suffix tuple because `q_proj.bias` at the LoRA-module level is
   user-trained bias state that an earlier round explicitly preserved.

2. unsloth_zoo/mlx/loader.py: `_apply_lora_at_paths()` only imported
   `LoRALinear` and wrapped every saved module path as plain LoRA.
   After the PR added DoRA export (fine_tune_type='dora' + saving the
   `.m` magnitude tensor), the path-aware reload branch silently
   downgraded DoRA back to LoRA because the recreated wrapper had no
   `.m` parameter, and the downstream `model.load_weights(
   strict=False)` dropped `q_proj.m`. Honour the saved
   `fine_tune_type`: import `DoRALinear` / `DoRAEmbedding` and wrap
   with DoRA classes when the adapter declares DoRA. Cover
   `nn.Embedding` and `SwitchLinear` paths too (DoRA falls back to
   LoRA on switch modules since mlx-lm has no DoRA switch wrapper).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b82aaba952

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/loader.py Outdated
Comment on lines +1097 to +1101
if use_dora:
raise RuntimeError(
"Unsloth MLX: adapter_config declares fine_tune_type='dora' but "
"mlx_lm.tuner.dora is unavailable; install a DoRA-capable mlx-lm "
"or convert the adapter to plain LoRA before reload."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve DoRA load failure instead of falling back silently

The new DoRA guard in _apply_lora_at_paths raises a RuntimeError when fine_tune_type=='dora' but DoRA classes are unavailable, but this error is currently swallowed by the broad except Exception around _apply_lora_at_paths in the adapter load flow, which then falls back to load_adapters. In that fallback path, unsloth_mlx_lora_module_paths handling is skipped, so vision/projector adapters can be dropped and the model loads in a partially applied state instead of failing loudly as intended. Please re-raise this compatibility error (or narrow the catch) so missing DoRA support cannot silently degrade adapter loading.

Useful? React with 👍 / 👎.

danielhanchen added a commit to danielhanchen/unsloth-zoo-staging that referenced this pull request May 25, 2026
….bias + DoRA reload) into staging

Resolution notes:
- unsloth_zoo/mlx/loader.py: kept the existing rich `_apply_lora_at_paths`
  implementation from staging (per-path fail-loud telemetry, legacy
  rank-from-tensor inference, DoRA / LoRAEmbedding / LoRASwitchLinear
  awareness with ImportError on missing classes). It is strictly a
  superset of the PR-unslothai#692 R16 change, so the inbound minimal version
  was dropped in favour of HEAD.

- unsloth_zoo/mlx/utils.py: kept the inbound `.linear.bias` filter
  additions on both `_LORA_WRAPPED_BASE_SUFFIXES` and
  `_ROOT_LORA_WRAPPED_BASE_KEYS`. Also gated PR-unslothai#692's
  `_extract_mlx_lora_parameters` first-block write on the absence of
  `unsloth_mlx_lora_module_paths`, so the path-filtered second walker
  is the canonical source for rank/scale/dropout when the caller pins
  explicit paths. Without this, PR-unslothai#692's broad first-LoRA-module
  walk borrowed rank from unselected or inconsistently-shaped modules
  before the explicit-filter rejection could run, regressing
  test_enrich_explicit_filter_does_not_borrow_from_unselected.
…ch rank fix for PR unslothai#692

Fixes three asymmetric-fix / regression issues from round 17 review:

1. unsloth_zoo/mlx/loader.py: `_apply_lora_at_paths` unconditionally
   imported `LoRAEmbedding` and `LoRASwitchLinear` from
   `mlx_lm.tuner.lora`. Older mlx-lm wheels that only ship
   `LoRALinear` raised an ImportError before any wrapper was attached,
   so linear-only adapters could not reload. Wrap both imports in
   try/except and treat missing classes as "skip this module type"
   instead of failing the whole adapter load path; LoRALinear (which
   has shipped since the first mlx-lm release we support) is still
   required.

2. unsloth_zoo/mlx/utils.py: the round-16 `update_repo_settings` block
   always raised on private=True when the call failed, even though
   `create_repo(private=True)` had already set visibility on freshly-
   created repos. A token without `write:repo_settings` then blocked
   the upload even though the repo was already private. Refactor the
   three duplicated blocks into `_ensure_hub_repo_visibility(api,
   repo_id, private)` which (a) skips the update when private is None,
   (b) prints-and-continues on private=False, (c) on private=True
   verifies via `repo_info` whether the repo is already private after
   the failed update; only raises when the visibility is confirmed
   non-private.

3. unsloth_zoo/mlx/utils.py: `_extract_mlx_lora_parameters` inferred
   switch LoRA rank from `lora_a.shape[-1]` for 2-D layouts, but
   `mlx-lm==0.22.x` stores switch `lora_a` as
   `(rank * num_experts, in_dims)`, returning ranks like 64 or 4096
   instead of 4 or 8. Prefer `lora_b.shape[-1]` when the module
   declares `num_experts` since both layouts (newer mlx-lm and 0.22.x)
   keep rank as the trailing axis of `lora_b`. Plain LoRALinear keeps
   the existing `shape[-1]` fallback.
…stale adapter_model.safetensors + overwrite stale lora_parameters for PR unslothai#692

Fixes four round 18 findings:

1. unsloth_zoo/mlx/loader.py: the path-aware reload site caught every
   Exception from `_apply_lora_at_paths()` and fell back to upstream
   `load_adapters()`. That fallback rebuilds plain LoRALinear wrappers
   at every saved path, so the explicit `RuntimeError` raised when
   `fine_tune_type='dora'` but `mlx_lm.tuner.dora` is unavailable was
   silently swallowed and DoRA `*.m` magnitude tensors dropped via
   `load_weights(strict=False)`. Catch RuntimeError separately and
   re-raise when the message contains the DoRA-unavailable signal,
   then continue catching every other Exception.

2. unsloth_zoo/mlx/utils.py: round 16 added `.linear.bias` to
   `_LORA_WRAPPED_BASE_SUFFIXES` and `_ROOT_LORA_WRAPPED_BASE_KEYS`
   to drop the wrapped base bias of LoRALinear. That was wrong:
   upstream mlx-lm LoRALinear also stores the legitimate user-trained
   bias at `q_proj.linear.bias`, and dropping it silently loses
   training state when the user unfreezes / trains bias. Remove
   `.linear.bias` from both filter sets and document why bare
   `.bias` and `.linear.bias` must survive trainable checkpoints.

3. unsloth_zoo/mlx/utils.py: the LoRA upload allow-list still
   permitted `adapter_model.safetensors`. The MLX adapter save path
   writes `adapters.safetensors`, never `adapter_model.safetensors`;
   a matching file in the directory is therefore stale by definition
   (PEFT/HF leftover) and would re-upload that stale PEFT adapter
   alongside the current MLX adapter. Drop it from the allow-list.

4. unsloth_zoo/mlx/utils.py: `_enrich_mlx_adapter_config` only wrote
   `lora_parameters` from `_extract_mlx_lora_parameters` when the
   caller dict did not already have it. A reused config with stale
   `lora_parameters={"rank": 99}` then survived against actual
   rank-4 saved tensors, causing reload wrapper mismatch. When the
   live model has any LoRA modules, derive rank/scale/dropout from
   the live walker and overwrite caller-supplied stale values; only
   preserve caller `lora_parameters` when the live model has no
   LoRA modules (declared_lora_artifact-only path).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfe7459371

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/utils.py
# trainables whose names happen to contain "lora" (e.g.
# lora_router.weight or base.lora_special.lm_head.weight) are not
# silently treated as adapter state.
adapter_keys = set(collect_mlx_lora_adapter_tensors(model))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard CCE LoRA probe for wrapper models

_is_lm_head_trainable() now unconditionally calls collect_mlx_lora_adapter_tensors(model), which requires model.parameters() and model.named_modules(). make_cce_loss_fn() executes this path for CCE training, so lightweight wrappers that only expose trainable_parameters() now fail with AttributeError during trainer setup/loss construction instead of training normally. Please add the same kind of defensive fallback used elsewhere (treat missing module APIs as “no LoRA adapters detected”) so non-standard wrappers do not crash.

Useful? React with 👍 / 👎.

…ter-detection except for PR unslothai#692

Round 18 added a DoRA-unavailable RuntimeError re-raise inside the
inner _apply_lora_at_paths handler so the saved DoRA `*.m` magnitude
tensors would not silently drop into a plain-LoRA reload. The re-raise
worked, but the outer adapter-detection `except Exception as e:` block
at loader.py only re-raised ValueError("Unsloth:") and swallowed every
other exception into a print + standard-load fallback. The DoRA
RuntimeError was therefore re-raised by the inner handler and then
immediately caught and silenced by the outer handler, falling back to
the base-only load path with no warning that fine_tune_type='dora'
was unsupported.

Extend the outer guard to also re-raise namespaced
`RuntimeError("Unsloth MLX:")` so the inner handler's intentional
re-raise reaches the user; unrelated runtime errors keep falling
through to the soft standard-load fallback.
…lothai#692

`_apply_lora_at_paths` re-attached LoRA wrappers via
`setattr(parent, leaf, wrapped)` and `parent = by_name.get(parent_path,
model)`. That silently no-ops when the saved path ends in a numeric
segment such as `vision_tower.merger.layers.0`, which the training-time
`_lora_walk_module` deliberately produces for Qwen2.5-VL's vision merger
and any projector whose Linears live inside a Python list:

- `name.rpartition(".")` -> `parent_path="vision_tower.merger.layers"`,
  `leaf="0"`.
- `by_name.get(parent_path, model)` returns `model` (the fallback)
  because `named_modules()` does not emit list containers themselves.
- `hasattr(model, "0")` is False, so the `setattr` branch is skipped
  silently. The base nn.Linear stays in place.
- The follow-up `load_weights(strict=False)` then silently drops the
  saved `...layers.0.lora_{a,b}` tensors and the user gets a
  "successful" reload with a fully reverted vision/projector LoRA
  and no warning.

Mirror the navigation pattern from `_lora_walk_module`: split the saved
path, walk segments trying `parent[int(seg)]` first then `getattr` for
attribute access, and apply the same `parent[int(leaf)] = wrapped`
fallback to `setattr` for the final segment. List-indexed wrappers now
install correctly and the subsequent `load_weights(strict=False)`
binds the saved tensors instead of dropping them.
danielhanchen added a commit to danielhanchen/unsloth-zoo-staging that referenced this pull request May 25, 2026
…ly_lora_at_paths

Staging's richer _apply_lora_at_paths has the same parent_unreachable bug
for list-indexed paths (e.g. vision_tower.merger.layers.0); the existing
telemetry would warn 'parent_unreachable' but still skip the wrapper and
let load_weights(strict=False) drop the saved tensors. Walk segments via
parent[int(seg)] / getattr and install the leaf via parent[int(leaf)] =
wrapped before falling back to setattr; only mark parent_unreachable when
both navigation paths fail.
danielhanchen added a commit to danielhanchen/unsloth-zoo-staging that referenced this pull request May 25, 2026
…on in _apply_lora_at_paths) into staging

Resolution: kept HEAD's richer _patch_mlx_lora_from_base_compat block (already adopted in earlier rounds). The incoming PR-unslothai#692 list-index navigation diff was already applied to staging's richer _apply_lora_at_paths via the prior commit on this branch; the merge conflict marker landed inside _patch_mlx_lora_from_base_compat (an unrelated function) due to whitespace overlap.
…string as default for PR unslothai#692

Three asymmetric-fix gaps caught by reviewers.

1) FastMLXModel.from_pretrained's adapter-detection inner block had a DoRA
   capability check inside the saved-paths branch (via _apply_lora_at_paths
   raising RuntimeError when mlx_lm.tuner.dora is missing), but the no-
   saved-paths fallback fell straight through to load_adapters(model,
   local_path) without that check. A DoRA adapter without
   unsloth_mlx_lora_module_paths would silently rebuild plain LoRA wrappers,
   then drop the .m magnitude tensors via the strict=False reload. Add the
   same capability check before the fallback load_adapters call so the user
   gets the namespaced "install a DoRA-capable mlx-lm" error.

2) _apply_lora_at_paths silently continued past embedding LoRA paths when
   the installed mlx-lm predated LoRAEmbedding. The saved embed_tokens
   adapter tensors then dropped via the downstream strict=False reload with
   no warning, leaving the user with a partially loaded adapter. Raise a
   namespaced RuntimeError instead, parallel to the DoRA-unavailable guard.

3) _push_lora_adapters_to_hub treated commit_message is not None as
   "caller wants custom metadata", which made the upload_large_folder
   fallback refuse uploads even when the caller forwarded the Unsloth
   default string verbatim (push_to_hub_lora -> _push_lora_adapters_to_hub
   pattern). Compare against the default strings the same way
   _push_merged_model_to_hub already does, so default-string forwarding no
   longer blocks the fallback.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen added a commit to danielhanchen/unsloth-zoo-staging that referenced this pull request May 26, 2026
…string as default for PR unslothai#692

Three asymmetric-fix gaps caught by reviewers.

1) FastMLXModel.from_pretrained's adapter-detection inner block had a DoRA
   capability check inside the saved-paths branch (via _apply_lora_at_paths
   raising RuntimeError when mlx_lm.tuner.dora is missing), but the no-
   saved-paths fallback fell straight through to load_adapters(model,
   local_path) without that check. A DoRA adapter without
   unsloth_mlx_lora_module_paths would silently rebuild plain LoRA wrappers,
   then drop the .m magnitude tensors via the strict=False reload. Add the
   same capability check before the fallback load_adapters call so the user
   gets the namespaced "install a DoRA-capable mlx-lm" error.

2) _apply_lora_at_paths silently continued past embedding LoRA paths when
   the installed mlx-lm predated LoRAEmbedding. The saved embed_tokens
   adapter tensors then dropped via the downstream strict=False reload with
   no warning, leaving the user with a partially loaded adapter. Raise a
   namespaced RuntimeError instead, parallel to the DoRA-unavailable guard.

3) _push_lora_adapters_to_hub treated commit_message is not None as
   "caller wants custom metadata", which made the upload_large_folder
   fallback refuse uploads even when the caller forwarded the Unsloth
   default string verbatim (push_to_hub_lora -> _push_lora_adapters_to_hub
   pattern). Compare against the default strings the same way
   _push_merged_model_to_hub already does, so default-string forwarding no
   longer blocks the fallback.
The fail-loud RuntimeError raised inside _apply_lora_at_paths when
mlx_lm.tuner.lora.LoRAEmbedding is unavailable was being swallowed by
the inner except clause's narrow DoRA-only re-raise filter, and falling
through to load_adapters() silently dropped the saved
embed_tokens.lora_a / lora_b tensors via the strict=False reload.

Broaden the inner filter to re-raise any "Unsloth MLX:" namespaced
RuntimeError so the embedding-LoRA capability gap surfaces to the
caller the same way the DoRA capability gap does. Mirrors the outer
handler's broader check directly below.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Comments-only refactor: drop pure narration and compress multi-line WHY
explanations to a single line per intent. No behavior change.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen added a commit that referenced this pull request May 27, 2026
* fix: persist MLX LoRA adapter metadata

* fix: preserve MLX LoRA dropout metadata

* fix: sync MLX LoRA adapter config fields

* fix: prefer live MLX LoRA metadata

* fix: handle MLX adapter reload edge cases

* Tighten metadata persistence and reload coverage for PR #679

Reload + metadata follow-ups from review feedback:

1. loader._apply_lora_at_paths() now also recreates LoRASwitchLinear
   for SwitchLinear / QuantizedSwitchLinear and LoRAEmbedding for
   Embedding / QuantizedEmbedding. The previous version saved switch
   rank metadata but the reload helper only wrapped Linear, so MoE
   adapter weights were silently dropped at load time. Switch/embedding
   imports are wrapped in try/except for older mlx-lm.

2. utils._get_mlx_dropout_probability() now reads MLX _p_1 keep-prob
   first and falls back to .p. Compat shims that expose both (.p=0.0
   default plus a real _p_1) previously wrote stale dropout=0.0.

3. utils._infer_mlx_lora_rank() returns None instead of falling back
   to lora_a.shape[-1] when lora_a and lora_b disagree on the rank
   dimension. The previous fallback wrote the input dim as rank for
   partially materialized or non-LoRA shapes.

4. utils._enrich_mlx_adapter_config() only infers rank/scale/dropout
   from modules that appear in the caller-provided
   unsloth_mlx_lora_module_paths set. Previously an unrelated earlier
   LoRA in named_modules() would write the wrong language-tower params
   when the caller selected a vision/projector path. Also distinguishes
   "caller passed nothing" from "caller passed [] or None" so explicit
   empty values are preserved.

5. utils._enrich_mlx_adapter_config() now fills num_layers (derived
   via _get_transformer_layers) and fine_tune_type when callers invoke
   save_lora_adapters() without a trainer-built config. mlx-lm
   load_adapters() dereferences config.num_layers and previously
   crashed with AttributeError on these direct-save artifacts.

* Fix MoE LoRA rank check axis for PR #679

The 3D rank-consistency check in _infer_mlx_lora_rank() used
lora_b_shape[-2] which is the out_features axis. mlx-lm's
LoRASwitchLinear stores lora_b as (num_experts, out_dims, rank)
so rank lives on lora_b_shape[-1]. Using [-2] caused valid MoE
adapters to be rejected as contradicted and rank to be left
unrecorded in adapter_config.json.

* Scrub .github/workflows for staging push (matches staging base)

* mlx: tighten LoRA adapter metadata save and reload

- _enrich_mlx_adapter_config: an explicit empty unsloth_mlx_lora_module_paths
  list now preserves the caller topology but no longer suppresses live
  rank/scale/dropout inference (treat empty as no filter, with a fallback
  module captured before filtering).
- MLXTrainer.save_model: skip lora_a-bearing modules whose rank cannot be
  inferred instead of silently falling back to rank 8 while still copying
  the bad module scale and dropout.
- _apply_lora_at_paths: when from_base does not accept scale/dropout
  kwargs on older mlx-lm, retry with r=rank alone and restore the saved
  scale on the wrapped layer (scale is a Python attribute, not loaded by
  load_weights).

* mlx: scope LoRA metadata fallback and restore dropout on reload

- _enrich_mlx_adapter_config: capture the fallback rank/scale/dropout only
  from modules that pass the explicit_path_set filter, so an explicit
  caller-supplied path list whose modules are not inferable no longer
  borrows metadata from unrelated LoRA layers. Auto-discovery and empty
  explicit-list paths still benefit from the fallback unchanged.
- _apply_lora_at_paths: when from_base() cannot accept the dropout kwarg
  on older mlx-lm, patch wrapped.dropout._p_1 (or .p on shims) after the
  fallback so the saved adapter dropout is faithfully restored alongside
  the scale that was already being patched.

* mlx: harden LoRA rank inference and drop dead metadata fallback

- _infer_mlx_lora_rank: replace getattr(lora_a, 'shape', ()) with explicit
  None and hasattr guards so a None tensor attribute can no longer raise
  AttributeError on callers that share this helper outside the existing
  hasattr-gated sites.
- _enrich_mlx_adapter_config: remove fallback_rank/scale/dropout variables
  and the post-loop rescue. After the explicit-path filter was moved ahead
  of fallback capture, both fallback_* and lora_rank are assigned from the
  same filtered module pool, making the rescue unreachable. The scoping
  decision is intentional: an explicit-filtered save must not borrow
  metadata from unselected modules.

* Add MLX LoRA adapter metadata persistence test coverage

* Sync .github/workflows with upstream author branch

* Harden LoRA wrap fallback and rank/dropout helpers for PR #679

Three follow-ups raised by the round-2 review pass:

1. _lora_walk_module() used LoRALinear.from_base(child, r=rank,
   dropout=..., scale=...) directly, while _apply_lora_at_paths()
   had a TypeError fallback for older mlx-lm releases. Extracted
   the fallback + metadata restoration into a shared
   _lora_from_base_compat() helper and used it from both sites so
   vision/projector LoRA creation survives the same older API
   reload already supports.

2. _get_mlx_dropout_probability() raised TypeError on
   float(_p_1=None). Treat _p_1 missing or non-numeric as absent
   and fall back to .p, then to 0.0, so a half-initialized
   dropout module cannot trip the broad except in
   _enrich_mlx_adapter_config and silently drop all metadata.

3. _infer_mlx_lora_rank() now requires both lora_a and lora_b
   shapes and verifies the expert/batch prefix matches for 3D
   MoE adapters. A module with only lora_a, only lora_b, or
   prefix-mismatched expert tensors now returns None so the caller
   moves on to the next module instead of recording stale rank
   from a half-built first module.

* Recognise uppercase LoRA attribute pair in metadata helpers for PR #679

PEFT-style adapters expose lora_A / lora_B. _infer_mlx_lora_rank now falls
back to the uppercase pair when only those attributes exist, and
_enrich_mlx_adapter_config walks both the lowercase and uppercase variants
so reload sees matching unsloth_mlx_lora_module_paths plus rank / scale /
dropout instead of defaulting silently.

* Scrub .github/workflows for staging push (matches staging base)

* Split: keep only 1 file(s)

* Apply uppercase LoRA pair lookup in MLXTrainer.save_model

The MLX adapter config enrichment helper accepts both lowercase
lora_a/lora_b and PEFT-style uppercase lora_A/lora_B attribute pairs
when inferring rank, scale, and dropout. MLXTrainer.save_model was
still prefiltering on the lowercase pair only, so for an uppercase
adapter the trainer's local rank/scale/dropout fell back to the
8/1.0/0.0 defaults before the config was passed downstream. Mirror
the enrich helper's attribute pair search so the trainer agrees with
the rest of the save path.

* Trim MLX LoRA adapter metadata comments

Collapse multi-line dropout/rank/enrich rationale blocks to their
load-bearing one-liners and shorten the from_base compat docstring.
Also adopt the layer-style is_layer fallback in _infer_mlx_lora_rank
so mlx-lm tuner Linear wrappers exposing .weight rather than .shape
return the correct rank.

* Consolidate MLX LoRA adapter metadata tests

Add coverage for is_layer rank inference on mlx-lm nn.Linear-style LoRA
halves, trainer metadata loop for lowercase + PEFT-style uppercase
attribute pairs, and rewrite the TypeError fallback tests to exercise
the production _lora_from_base_compat / _apply_lora_metadata_to_wrapper
helpers directly so a regression in those helpers actually fails the
suite.

* Drop unreloadable uppercase LoRA paths and normalize explicit paths for PR #679

The earlier uppercase lora_A / lora_B detection got recorded as a
reloadable MLX adapter path, but Unsloth and mlx-lm only ever recreate
lowercase lora_a / lora_b wrappers on reload; uppercase weights would
load with strict=False and silently disappear. mlx-lm itself uses
lowercase exclusively, so dropping the uppercase fallback removes a
save-reload asymmetry without losing any real caller.

Also tighten three rough edges that came out of the same review pass:

- _infer_mlx_lora_rank now rejects a Switch/MoE lora_a (..., rank, in)
  paired with a bare 2D lora_b, matching mlx-lm's LoRASwitchLinear shape
  contract instead of returning a plausible-looking rank.
- _enrich_mlx_adapter_config normalizes a single-string
  unsloth_mlx_lora_module_paths value to a one-element list so the
  loader does not iterate the string character by character.
- The trainer save_model metadata loop and test helper drop the matching
  uppercase pair branch in lockstep.

* Normalize load-side module paths and rank inference edges for PR #679

External adapter_config.json files can store
unsloth_mlx_lora_module_paths as a bare string. The save side normalized
that case in a prior commit, but the load side still iterated the raw
string character-by-character, so no LoRA layer was actually wrapped and
load_weights(..., strict=False) silently dropped the trained tensors.
Add _normalize_mlx_lora_module_paths() and apply it at every reload
entry point.

Adjacent reload fixes from the same review pass:

- Whenever unsloth_mlx_lora_module_paths is present, still call
  mlx_lm.tuner.utils.load_adapters() to rebuild the standard language
  tower. The previous flow took the auxiliary-paths-only branch and
  let strict=False ignore any language-tower LoRA tensors.
- _apply_mlx_lora_initialization() now unwraps mlx-lm nn.Linear-wrapped
  lora_a/lora_b via .weight.shape, matching the unwrap that
  _infer_mlx_lora_rank() already does. get_peft_model(init_lora_weights=
  False) and "gaussian" no longer crash on the layer-wrapped form.
- _infer_mlx_lora_rank() requires both halves to be at least 2-D, so a
  bare 1-D lora_b can no longer leak a plausible-but-wrong rank into
  adapter_config.json.

* Stop persisting placeholder LoRA metadata + surface adapter reload failures

MLXTrainer.save_model previously initialized rank/scale/dropout to 8/1.0/0.0
and num_layers to -1 as fallbacks. When _infer_mlx_lora_rank failed for every
module or _get_transformer_layers returned no layers, those placeholders
silently landed in adapter_config.json. On reload the adapter was rebuilt at
the wrong rank/scale and mlx-lm's load_adapters sliced range(-1) to wrap zero
LoRA layers, so trained weights had no effect with no warning.

Initialize the trainer locals to None and only include num_layers / rank /
scale / dropout / lora_parameters in adapter_config when the source value is
valid. Coerce per-expert mx.array scales from LoRASwitchLinear via .item()
with a safe fallback so the trainer cannot crash building the config.

In loader.py, replace the bare except: pass on _apply_lora_at_paths with
warnings.warn so partial wrapping failures surface. After the
load_weights(strict=False) fallback, diff the saved safetensors keys against
the live module tree and warn on any tensor that cannot bind, so users see
when trained weights are silently dropped.

Update test_trainer_metadata_loop_no_lora_uses_defaults to track the new
contract (None instead of 8/1.0/0.0) and add coverage for: rank-inference
failure returning None, mx.array scale coercion (0-D succeeds, wider arrays
fall back to 1.0), and the trainer's adapter_config dict gating both the
num_layers sentinel and the placeholder rank/scale/dropout trio.

* Coerce mx.array scale in enrich + narrow diff warning to LoRA tensors for PR #679

_enrich_mlx_adapter_config previously did a raw float(getattr(module, "scale", 1.0)).
When a LoRA module exposes scale as a multi-element mx.array (LoRASwitchLinear
per-expert scale), float() raises and the outer try/except: pass swallowed the
entire enrichment block, including unsloth_mlx_lora_module_paths. On reload, the
load-side _apply_lora_at_paths short-circuits without those paths, so
vision/projector/MoE LoRA tensors are silently dropped via load_weights(strict=False)
without firing the new missing-tensor warning. Mirror the trainer-side .item()/
fallback-1.0 coercion in enrich so the path list survives.

Narrow the new safetensors-vs-live-tree diff in loader.py to keys ending in
.lora_a / .lora_b / .lora_A / .lora_B / .lora_embedding_a / .lora_embedding_b.
Before, comparing all keys could fire false-positive "missing" warnings for VLM
adapters where mlx-lm rebinds tensors under a different namespace prefix
(model.layers... vs language_model.model.layers...). The warning's purpose is
to catch silently-dropped LoRA training weights, so filtering to LoRA-suffix
keys keeps the signal and drops the noise.

Add tests: enrich-side mx.array scale coverage with both 0-D (item() returns)
and wider arrays (fallback to 1.0), asserting unsloth_mlx_lora_module_paths
survives in both cases.

* Harden adapter path normalize + surface skipped key diff for PR #679

_normalize_mlx_lora_module_paths now accepts dict-grouped (e.g. {"language":
[...], "vision": [...]}) and pathlib.Path-typed entries instead of silently
returning []. Older or hand-authored adapter_config.json layouts no longer
drop their module paths.

Narrow the safetensors-vs-live-tree diff except block to (ImportError, OSError)
so unrelated bugs in the diff code (e.g. accidental NameError after a refactor)
still surface. When the diff itself is skipped, emit a warning so the user
knows the silent-drop diagnostic was bypassed; the load_weights call still runs
afterwards either way.

Add tests covering dict + pathlib normalization.

* Restore load_weights call when adapter key diff raises + bare PathLike for PR #679

The prior commit narrowed the safetensors-vs-live-tree diff except to
(ImportError, OSError) but left model.load_weights(adapter_weights_file,
strict=False) OUTSIDE the except. Any other exception from safe_open or
tree_flatten (notably SafetensorError on a header issue, or ValueError /
AttributeError on unusual model objects) now escapes the narrowed except
and crashes the entire from_pretrained, even though the actual load was
never attempted. Pre-narrow behavior would have still reached the load.

Widen the diff except back to bare Exception so the diagnostic can never
block the actual load. The skip-warning still fires so a real bug in the
diff code is visible. The narrowing intent (catch unrelated bugs) is
preserved by the warning message.

_normalize_mlx_lora_module_paths now also handles a bare os.PathLike
passed at top level (not wrapped in a list). The dict / list / pathlib-
in-list arms were added previously; this rounds out the input shapes.

_enrich_mlx_adapter_config's outer try/except: pass was masking legitimate
caller-input bugs (e.g. lora_parameters="garbage"). Narrow to
(TypeError, ValueError, AttributeError) and warn so partial-metadata
writes surface instead of silently producing an under-specified
adapter_config.json.

* Fix 5 P1s from reviewer.py round on PR #679

1) Save-side normalize asymmetry: _enrich_mlx_adapter_config now reuses the
   loader-side _normalize_mlx_lora_module_paths so save and load accept the
   same shapes (str / list / tuple / set / dict / pathlib.Path). Before,
   passing {"vision": ["vision.proj"]} or [Path("vision.proj")] erased
   topology before it reached adapter_config.json.

2) Reload nested-wrap / always-fallback: Reorder so mlx-lm's load_adapters
   runs FIRST and rebuilds the language tower via its canonical walk; then
   _apply_lora_at_paths re-attaches only the auxiliary paths (vision /
   projector / MoE / embedding) that mlx-lm does not handle. Added a
   skip-if-already-wrapped guard inside _apply_lora_at_paths so language
   paths land as no-ops when load_adapters has already wrapped them.
   Previously the prewrap forced load_adapters to either fail or produce
   nested LoRALinear(LoRALinear(Linear)).

3) Silent base-model fallback: when load_adapters raises, the fallback
   no longer leaves load_weights unrun nor returns the base model;
   the original load_adapters exception is preserved and re-raised if
   adapters.safetensors is missing or there is nothing to bind against.

4) Reload rank=8 default: _apply_lora_at_paths now raises ValueError when
   rank metadata is missing from adapter_config, matching the save-side
   None-gating contract. No more silent rebuild at rank=8.

5) _apply_mlx_lora_initialization clobbering layers: when lora_a / lora_b
   are nn.Linear-wrapped, the previous `module.lora_a = mx.zeros(...)`
   replaced the layer with a raw array, destroying the wrapper and its
   methods. New _assign_lora_tensor helper writes to .weight when the
   slot is layer-backed and falls back to setattr for bare arrays.

Also expand the saved-vs-live-tree diff suffix tuple to include
.lora_a.weight / .lora_b.weight / .lora_A.weight etc., so the missing-key
warning fires for layer-wrapped saves too. Without this it underreported
drops by half.

Add tests: dict / pathlike / bare-pathlike normalization on both save and
load sides; _enrich preserves dict-grouped and pathlib explicit paths.
Tests now call _load_utils() at the top so they pass under selected /
sharded runs that lack the implicit sys.modules side effects from earlier
tests in the file.

* Bind aux tensors + DoRA reload + lazy rank validation for PR #679

Three real bugs surfaced by the reviewer.py round-8 sweep against
head fc89ba6, plus a follow-on DoRA reload-side hook needed for
PR #692's new fine_tune_type='dora' stamp.

(1) Auxiliary LoRA tensors never loaded after load_adapters success.

The post-R7c reordering ran load_adapters first (binds the language
tower), then _apply_lora_at_paths attached auxiliary LoRA wrappers
(vision / projector / MoE / embedding). But the follow-up load_weights
call only existed in the load_adapters-failed fallback. When the
common case succeeded, the new aux wrappers were left init-only and
the saved auxiliary tensors were silently discarded. Add a second
load_weights(adapter_weights_file, strict=False) after the success
branch when _aux_attached > 0; strict=False is safe because the
language tower is already bound and the aux paths now exist.

(2) Silent base-model fallback when no wrappers were attached.

When load_adapters failed AND _apply_lora_at_paths attached zero
wrappers, the strict=False fallback was still calling load_weights on
a bare model, which silently dropped every LoRA tensor and returned
the base. The caller never knew the adapter failed. Re-raise the
original load_adapters exception in that case.

(3) Missing-rank validation ran before the already-wrapped skip.

Legacy adapters that stored unsloth_mlx_lora_module_paths but no
rank/lora_parameters would crash here even when load_adapters had
already rebuilt the language tower (so the loop would skip every
path). Defer the rank check to the moment we actually need to wrap;
loops over already-wrapped paths now return 0 cleanly.

(4) DoRA reload support: when adapter_config.json carries
fine_tune_type='dora' (the new save-side stamp from PR #692),
_apply_lora_at_paths now selects mlx_lm.tuner.dora.DoRALinear /
DoRAEmbedding instead of plain LoRALinear / LoRAEmbedding. Without
this, the saved q_proj.m magnitude tensor silently dropped via
strict=False and the reloaded model lost DoRA semantics.

Also: _apply_lora_at_paths now returns the number of wrappers
attached so the caller can decide whether the post-success
load_weights is needed.

* Fix asymmetric adapter diagnostics + scale coercion + from_base compat for PR #679

Round-9 review caught several asymmetric / silent-fallback bugs along the
adapter reload + save paths. This commit:

* Adds a shared `_warn_missing_adapter_keys` diagnostic that diffs saved
  vs live LoRA keys (covers raw, layer-backed, embedding, and DoRA `.m`
  variants), and routes both the success and fallback `load_weights`
  branches through it so silent drops surface consistently.
* Refactors the fallback branch's previously inline diagnostic into a
  single call to the shared helper, eliminating duplicated suffix lists
  and ensuring DoRA `.m` tensors are included in the diff.
* Adds `_coerce_mlx_lora_scale` so LoRASwitchLinear's per-expert
  `mx.array` scale no longer silently degrades to `1.0` when `.item()`
  raises. Uses it in `MLXTrainer.save_model` and
  `_enrich_mlx_adapter_config` so both save sides agree.
* Stops `_enrich_mlx_adapter_config` from overwriting caller-provided
  global LoRA metadata when the caller pinned `unsloth_mlx_lora_module_paths`
  to an aux subset; the language tower's global rank/scale/dropout is
  now preserved when the explicit-path filter narrows inference.
* Adds `_patch_mlx_lora_from_base_compat` that monkey-patches mlx-lm's
  `LoRALinear` / `LoRASwitchLinear` / `LoRAEmbedding` `from_base` so the
  upstream `linear_to_lora_layers` walk accepts `scale=` / `dropout=` on
  older mlx-lm wheels (mirrors the aux-path compatibility shim already
  in `_lora_from_base_compat`). Called before each language-LoRA wrap
  site; idempotent; skipped when the class has no `from_base` attribute.

* Patch from_base on reload + preserve namespaced runtime errors + fail loud on switch/embedding for PR #679

Round-10 review caught several asymmetric / silent-fallback bugs along
the adapter reload + auxiliary wrapping path:

* `load_adapters()` is now preceded by `_patch_mlx_lora_from_base_compat()`
  so older `mlx-lm` wheels that reject `scale=` / `dropout=` kwargs on
  `from_base()` succeed during reload the same way the training path
  already does. Previously a clean reload could hard-fail on those
  wheels and force the fallback even when adapter metadata was correct.
* The outer adapter-detection catch now preserves namespaced
  `RuntimeError("Unsloth MLX: ...")` in addition to `ValueError` and
  `ImportError`, so the new "adapter load failed and adapters.safetensors
  is missing" runtime error no longer falls through to a silent base-
  model load.
* `_ensure_metadata()` wraps the `int(raw_rank)` / `float(...)` coercions
  in a namespaced `Unsloth MLX:` `ValueError` so malformed configs
  (`"rank": "not-an-int"`) raise the same prefix the outer catch
  preserves. Previously the plain conversion error was swallowed.
* The switch and embedding aux-wrap branches in `_apply_lora_at_paths`
  now raise `ImportError` when `LoRASwitchLinear` / `LoRAEmbedding` is
  unavailable, mirroring the DoRA hard-fail rule. Saved switch /
  embedding LoRA tensors no longer silently drop via `strict=False`
  on an older mlx-lm.
* Legacy direct-save adapters that wrote `unsloth_mlx_lora_module_paths`
  but did not persist `rank` / `scale` / `dropout` in `adapter_config.json`
  now have their rank inferred from `adapters.safetensors` tensor shapes
  via `_infer_rank_from_saved_adapter`. Previously these adapters
  regressed from "load with placeholder rank=8" to a hard reload failure.

* Patch DoRA from_base + fix uppercase lora_A rank + complete explicit-path metadata + fail-loud partial fallback for PR #679

Round-11 review caught four remaining gaps in the adapter reload + save
metadata paths:

* `_patch_mlx_lora_from_base_compat` now also patches
  `mlx_lm.tuner.dora.{DoRALinear,DoRAEmbedding}`. mlx-lm's load_adapters
  reaches into DoRA's `from_base(..., scale=..., dropout=...)` for
  `fine_tune_type="dora"` and older wheels reject those kwargs the same
  way LoRA does. Setattr is wrapped in try/except so simulation stubs
  with __slots__ or no __dict__ degrade gracefully.
* `_infer_rank_from_saved_adapter` now classifies suffixes by tensor
  orientation (`rank_first_2d_suffixes`) rather than presence of
  `.weight`. PEFT-style uppercase `lora_A` (no `.weight`) is saved as
  `(rank, in_features)` like `lora_a.weight`, not `(in_features, rank)`
  like the raw lowercase `lora_a`. Previously `(4, 512)` was inferred
  as rank 512, building an incompatible wrapper that mis-binds the
  saved rank-4 tensor on strict=False.
* `_enrich_mlx_adapter_config`'s explicit-path + caller-metadata elif
  branch previously skipped backfilling `num_layers` and treated
  caller-supplied partial metadata (e.g. `{"rank": 4}` only) as
  complete. Now mirrors the main branch's `num_layers` backfill and
  fills `scale` / `dropout` from inferred values (or `1.0` / `0.0`
  defaults), so the LoRA dict mlx-lm.load_adapters reads is always
  complete.
* The manual fallback after `load_adapters()` fails now consults the
  saved-vs-live key diff returned by `_warn_missing_adapter_keys()`
  and re-raises a chained `RuntimeError` (preserving the original
  load_adapters exception via __cause__) when language-tower LoRA
  keys remain unbound. Previously `strict=False` silently dropped
  the language LoRA tensors and returned an aux-only adapter that
  trained as base + auxiliary noise.

* Diagnose success-branch aux drops + narrow from_base TypeError catch + log per-path aux skips for PR #679

Round-12 Opus review caught three remaining silent-data-loss surfaces in
the adapter reload path:

* The success branch only ran `_warn_missing_adapter_keys` when
  `_aux_attached > 0`. If a caller declared `unsloth_mlx_lora_module_paths`
  but every aux path was unreachable (live module renamed, removed, or
  an unhandled type the wrap loop skipped), `load_adapters()` would
  succeed for the language tower, `_apply_lora_at_paths` would skip
  every aux path, and the saved aux tensors stayed in
  adapters.safetensors with no live module to bind into; the user got
  a silently incomplete VLM/MoE adapter. Now always diff saved-vs-live
  keys when the saved config declared aux paths, and raise the same
  shape of `RuntimeError` the fallback branch already raises when no
  aux wrapper attached AND saved tensors remain unbound.
* `_apply_lora_at_paths` previously `continue`d silently on three
  classes of per-path failure (`module is None`, `lora_cls is None`,
  `parent is None / not hasattr(parent, leaf)`). The new
  `_skipped_paths` list now tracks each skip with a reason
  (`module_missing`, `unhandled_type:<typename>`,
  `parent_unreachable`) and emits a single consolidated `warnings.warn`
  at the end of the wrap loop so the user can tell WHICH live module
  is missing or unsupported (the safetensors-level diagnostic only
  reports tensor keys, not module reasons).
* `_lora_from_base_compat` and the monkey-patched `_compat_from_base`
  previously caught EVERY `TypeError` from `from_base(..., scale=...,
  dropout=...)` and retried with progressively fewer kwargs, ending at
  the upstream default `r=8`. If the underlying TypeError came from an
  unrelated source (e.g. a future mlx-lm wrapper validating that
  `SwitchLinear` cannot be wrapped by `LoRALinear` and raising
  `TypeError`), the fallback would silently bind a rank=8 wrapper to a
  saved different-rank tensor that `strict=False` then drops.
  Introduce `_is_from_base_kwarg_typeerror` heuristic and only retry on
  signature-mismatch messages; unrelated TypeErrors propagate.

* Fail-loud on partial success branch + tighten TypeError needles + honor caller LoRA metadata for PR #679

Round-13 Opus review caught three follow-on bugs in the R12 fixes:

* Success branch only raised when `_aux_attached == 0` AND saved tensors
  remained unbound. The "mixed partial" case (3 of 5 declared aux paths
  attached, 2 skipped) silently called load_weights(strict=False) and
  returned an adapter with the skipped paths' saved tensors dropped.
  Move the saved-vs-live diagnostic raise outside the `_aux_attached
  == 0` guard so it fires for any non-empty `_missing_after_success`,
  mirroring the fallback branch's semantics.
* `_FROM_BASE_KWARG_TYPEERROR_NEEDLES` previously included bare `scale`
  and `dropout` substrings, which made the heuristic mis-classify any
  unrelated TypeError that mentions those words (e.g. an mlx-lm
  internal `TypeError("scale must be a finite float")` or a
  `Dropout(p=...)` validator) as a signature mismatch. That defeated
  the R12 narrowing intent: such errors would still silently fall
  through to `from_base(module, r=r)` and `_apply_lora_metadata_to_wrapper`,
  reaching the same wrong-rank wrapper R12 was supposed to prevent.
  Drop the bare substrings; keep only the single-quoted CPython forms
  and explicit "not accepted" / "not supported" rejection phrasings.
* `_enrich_mlx_adapter_config` first branch required
  `explicit_filter_narrowed AND has_caller_lora_metadata` to skip the
  overwrite, so a direct `save_lora_adapters(model, path,
  adapter_config={"rank": 4, "scale": 8.0})` call without
  `unsloth_mlx_lora_module_paths` silently replaced the caller's
  rank/scale with whatever the first LoRA module walk happened to
  infer. Honour caller-supplied metadata symmetrically: drop the
  explicit-paths requirement from the skip guard so any caller-
  provided `rank`/`scale`/`dropout` wins over inferred values.

* Backfill rank in elif + num_layers fallback + correct embedding rank axis + drop dead local for PR #679

Round-15 review caught a regression and three follow-on bugs in the
recent metadata + reload helpers:

* `_enrich_mlx_adapter_config` elif branch's inferred_fallbacks only
  filled `scale` and `dropout`; if a direct save_lora_adapters caller
  passed `{"scale": 9.0}` (no rank), the elif gate `if "rank" in
  lora_parameters` was always False and the saved config dropped LoRA
  metadata entirely. Add `rank` to inferred_fallbacks (with `None` as
  the no-default sentinel so the gate still skips when neither caller
  nor inference produced a rank).
* `MLXTrainer.save_model()` omitted `num_layers` when
  `_get_transformer_layers()` returned None / empty. mlx-lm's
  load_adapters does `config.num_layers` attribute access on the
  SimpleNamespace built from adapter_config.json, raising
  AttributeError when the key is missing. Pre-PR wrote `-1` as the
  legacy "all layers" sentinel; restore that fallback so reload no
  longer crashes when layer detection fails.
* `_infer_rank_from_saved_adapter` had `.lora_embedding_a.weight` in
  `_rank_first_2d_suffixes`. mlx-lm's LoRAEmbedding saves the A tensor
  as `(num_embeddings, rank)`, not `(rank, in_dims)`, so taking
  `shape[0]` returned `num_embeddings` (e.g. 32000) as the inferred
  rank. Remove the suffix so it falls through to the default `shape[-1]`
  branch, returning the actual rank.
* The R13 `explicit_filter_narrowed` local in
  `_enrich_mlx_adapter_config` was unused after the R13 guard
  simplification. Drop it to clear the Ruff F841 11/12 reviewers
  flagged.

* Let live LoRA rank override stale caller metadata + restore from_base positional order + narrow TypeError classifier for PR #679

Fixes three regressions / asymmetric-fix gaps from round 16 review:

1. unsloth_zoo/mlx/utils.py: _enrich_mlx_adapter_config previously
   treated caller-supplied rank/scale/dropout as canonical even when
   the live LoRA modules proved different values. Saving a live rank-4
   adapter with a stale `{"rank": 8, "scale": 1.0, "dropout": 0.0}`
   caller config wrote rank=8 / scale=1.0 to adapter_config.json and
   produced reload-time shape mismatches. Live module state describes
   the tensors being saved now, so it must override stale scalar
   metadata; caller metadata is now only used as fallback when no
   trustworthy live rank can be inferred. Also tighten num_layers
   fallback to write -1 (the legacy "all layers" sentinel) instead
   of omitting the key, matching the trainer save path so both halves
   keep mlx-lm load_adapters' attribute access on `config.num_layers`
   working uniformly.

2. unsloth_zoo/mlx/loader.py: _patch_mlx_lora_from_base_compat
   installed `(cls, module, r=8, scale=20.0, dropout=0.0)` while
   upstream mlx-lm `LoRALinear.from_base` is
   `(linear, r=8, dropout=0.0, scale=20.0)`. After the monkey-patch,
   any external positional caller writing
   `LoRALinear.from_base(linear, 4, 0.1, 2.0)` silently received
   `scale=0.1` / `dropout=2.0`. Restore upstream positional order.

3. unsloth_zoo/mlx/loader.py: _is_from_base_kwarg_typeerror used to
   treat ANY `"not supported"` / `"not accepted"` substring as a
   from_base signature mismatch, so a wrapper raising
   `TypeError("rank 4 not supported by this wrapper")` would trigger
   the fallback path that retries without rank and returns a default
   rank-8 wrapper. Require the message to also name one of the kwargs
   we are sending (as either a CPython-quoted name or a standalone
   word) before treating broad needles as signature mismatches. Use
   regex word boundaries so single-char kwargs like "r" do not match
   substrings of "rank" / "wrapper". Updated the retry sites to use
   the new `kwargs=` API.

* Symmetric num_layers fallback + shape-aware adapter diagnostic + fail-loud rank fallback for PR #679

Fixes three asymmetric-fix gaps from round 17 review:

1. unsloth_zoo/mlx/utils.py: the caller-metadata elif branch of
   `_enrich_mlx_adapter_config` only wrote `num_layers` when the layer
   count was positive, while the inferred-rank branch and trainer
   already write `-1` as the legacy "all layers" sentinel when
   discovery fails. A direct `save_lora_adapters()` caller that
   supplied valid rank/scale/dropout but no num_layers would then
   produce a config that mlx-lm.load_adapters() could not read
   (it does `config.num_layers` attr access on a SimpleNamespace).
   Mirror the `-1` fallback in the elif branch so both halves agree.

2. unsloth_zoo/mlx/loader.py: `_warn_missing_adapter_keys()` only
   diffed the saved vs live LoRA key sets. A wrong-rank live wrapper
   at a matching key name (e.g. mlx-lm.load_adapters wrapped the
   language tower at the upstream default rank=8 while saved tensors
   are rank-4) was reported as "fully bound" even though the
   downstream `load_weights(strict=False)` would silently drop the
   shape-incompatible tensors. Compare tensor shapes too and surface
   shape mismatches in the warning preview.

3. unsloth_zoo/mlx/loader.py: both `_lora_from_base_compat()` and the
   monkey-patched `_compat_from_base()` ended their compat ladder with
   a bare `from_base(module)` call. On an older shim that rejects `r=`
   for ANY value, that silently produced a rank-8 wrapper even when
   the caller requested rank-4, and `load_weights(strict=False)` then
   dropped the saved rank-4 tensors. Add `_no_rank_fallback_or_fail()`
   which raises a namespaced ValueError when the requested rank is
   not the upstream default (8); only rank=8 callers may keep the
   no-rank fallback.

* Constrain mlx-lm load_adapters via lora_parameters keys + symmetric raise on success branch + guard caller-fallback on explicit-path saves for PR #679

Fixes three asymmetric-fix gaps from round 18 review:

1. unsloth_zoo/mlx/utils.py: `_enrich_mlx_adapter_config` recorded
   `unsloth_mlx_lora_module_paths` but did not persist a
   `lora_parameters["keys"]` entry. On reload, upstream
   `mlx_lm.tuner.utils.load_adapters` treats missing `keys` as "scan
   every Linear / Embedding / Switch layer" and creates extra
   zero-init LoRA wrappers outside the saved topology. Write the
   saved paths into `lora_parameters["keys"]` in both the
   inferred-rank branch and the elif caller-metadata branch so the
   upstream walker stays constrained to exactly the modules Unsloth
   recorded.

2. unsloth_zoo/mlx/loader.py: the success branch of the reload path
   only raised on `_missing_after_success` when an explicit path list
   was present. A stale language-only adapter (config says rank=8 but
   `adapters.safetensors` carries rank-4 tensors) was caught by the
   shape-aware diagnostic but then ignored because no aux paths were
   declared. Drop the `_saved_lora_paths` guard so the success branch
   raises on any missing-or-shape-incompatible saved tensor, matching
   the fallback branch's behaviour.

3. unsloth_zoo/mlx/utils.py: when the caller pinned explicit module
   paths AND those selected real live LoRA modules that all failed
   trustworthy rank inference (e.g. malformed half-built wrappers),
   the elif caller-metadata branch still persisted the caller's
   stale top-level rank/scale/dropout into `lora_parameters`. Track
   "selected lora seen but inference failed" via a new
   `allow_caller_metadata_fallback` flag and skip the caller-write
   in exactly that case so placeholder metadata can not survive
   against the actual saved tensor shapes.

* Sync lora_parameters keys + refuse silent base fallback on LoRA adapter for PR #679

Two asymmetric-fix gaps caught by reviewers.

1) lora_parameters[keys] drifts from the authoritative
   unsloth_mlx_lora_module_paths in two cases. An explicit empty pin
   (unsloth_mlx_lora_module_paths=[]) skipped the keys write because
   the gate was `if saved_paths`, so the saved config still let mlx-lm
   scan every Linear/Embedding/Switch and reinstate ghost wrappers. A
   stale caller-supplied lora_parameters[keys] also survived the live
   override that overwrites rank/scale/dropout and the path list, so
   mlx-lm reloaded wrappers for paths that no longer existed. Extract
   the sync into _sync_mlx_lora_keys() so both branches mirror the
   authoritative path list (including the empty-pin case) and drop any
   stale caller keys when no path list is present.

2) The adapter-detection fallback in FastMLXModel.from_pretrained only
   re-raised Unsloth-tagged ValueError / ImportError / RuntimeError, so
   an ordinary AttributeError on a malformed lora_parameters block
   (e.g. SimpleNamespace built without the field) fell through and
   silently base-loaded the model with no adapter weights. When the
   adapter_config.json explicitly declares peft_type=LORA or
   fine_tune_type in {lora, dora}, refuse to swallow the error and
   raise a namespaced RuntimeError instead.

* Tighten verbose comments in mlx adapter metadata code

Comments-only refactor: drop pure narration and compress multi-line WHY
explanations to a single line per intent. No behavior change.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
…-export-adapters

# Conflicts:
#	unsloth_zoo/mlx/loader.py
#	unsloth_zoo/mlx/trainer.py
#	unsloth_zoo/mlx/utils.py
@danielhanchen danielhanchen merged commit 606d2b2 into unslothai:main May 27, 2026
11 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 265543db2f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/utils.py
Comment on lines +2814 to +2815
adapter_keys.add(f"{prefix}lora_a")
adapter_keys.add(f"{prefix}lora_b")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Save layer-backed LoRA adapter weights

When mlx-lm represents lora_a/lora_b as submodules rather than raw arrays, model.parameters() flattens the tensors as keys like q_proj.lora_a.weight and q_proj.lora_b.weight (the loader already has compatibility code for this .weight form). This allow-list only includes the bare q_proj.lora_a/q_proj.lora_b names, so normal layer-backed LoRA modules are collected as empty/missing adapters; save_pretrained_merged(save_method="lora") can then raise “no LoRA layers” or write an adapter missing the actual LoRA factors. Please include the .weight parameter keys for layer-backed adapter attributes before filtering.

Useful? React with 👍 / 👎.

danielhanchen pushed a commit to mmathew23/unsloth-zoo that referenced this pull request May 27, 2026
Merges 5 main-side mlx fixes (unslothai#673 zero-token CCE, unslothai#679 + unslothai#692 LoRA save
metadata, unslothai#682 invalid label NaN-poisoning, unslothai#688 tool mask). All 13
conflict regions in unsloth_zoo/mlx/utils.py resolved to keep PR unslothai#684's
behavior where it conflicts on semantics:

  - half-open `<` length mask (PR unslothai#684 fix) wins over main's inclusive `<=`
  - `if labels is None` branch preserved (PR unslothai#684 generality) alongside
    main's `_normalize_cce_label_dtype` dtype widening
  - `_get_image_token_ids` legacy wrapper kept alongside main's new
    `_normalize_cce_label_dtype` / `_normalize_numpy_cce_labels`
  - `_mask_label_token_ids` calls `_normalize_cce_label_dtype` first so
    image masking honors main's uint-widening contract
  - HEAD's `_expand_token_replacements` dropped; main's three-function
    split (`_normalize_numpy_cce_labels` + `_expand_image_token_sequences`
    + `_expand_token_runs`) is canonical; duplicate HEAD wrappers removed
  - `_collate_vlm_prompt_completion_batch` reads back the masked labels
    in int64 so image + attention masking survives without narrowing
  - prompt-completion VLM collator routes through `_apply_vlm_label_masks`
    after dtype normalisation so ignore_token_ids and wide invalid ids
    both reach runtime CCE intact
  - `_to_mx_vlm_batch` uses main's `_normalize_cce_label_dtype` for labels
    while keeping PR unslothai#684's token_type_ids / mm_token_type_ids handling
  - `_unsloth_*` prefix filter preserved so the new collated_position_ids
    flag and main's raw-input-ids carrier both get stripped

152 MLX tests pass post-merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-approved Auto-review approved the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants