Skip to content

fix(mlx): expose finetune_last_n_layers for parity with mlx-lm CLI#669

Merged
danielhanchen merged 2 commits into
mainfrom
fix-mlx-num-layers-parity
May 19, 2026
Merged

fix(mlx): expose finetune_last_n_layers for parity with mlx-lm CLI#669
danielhanchen merged 2 commits into
mainfrom
fix-mlx-num-layers-parity

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

  • Adds finetune_last_n_layers parameter to FastMLXModel.get_peft_model (default None = all layers, current behavior unchanged).
  • Wired into both VLM and text-only code paths.
  • When set, applies LoRA only to the last N transformer blocks (matching mlx-lm CLI's CONFIG_DEFAULTS['num_layers']=16 semantics at mlx_lm/lora.py:56).
  • Companion PR in unsloth/unsloth exposes the same knob on the CUDA path so a single config value controls layer-selection across CUDA / MLX / mlx-lm CLI.

Why

mlx-lm CLI defaults num_layers=16 -> LoRA on the LAST 16 transformer blocks. unsloth-zoo's get_peft_model historically applied LoRA to ALL transformer layers (matching HF PEFT/CUDA semantics).

On small models the difference can show up as a basin-selection divergence: the extra LoRA modules consume mx.random state during init and change the trainable-parameter set, so two otherwise-identical runs land in different basins of attraction.

Empirical (n=15 seeds, gemma-3-270m-it single-row LoRA memorization fixture):

  • mlx-lm CLI default last-16 layers: 67% greedy-decode pass rate
  • unsloth-zoo defaults (all 18 layers): 47% greedy-decode pass rate
  • Teacher-forced completion loss is 0 in both — the model memorizes either way; only the first-token argmax distribution differs.

This PR keeps the default behavior unchanged (None = all layers) so existing users see no change. Passing finetune_last_n_layers=16 puts the run in the same basin family as mlx-lm CLI for direct comparisons.

The value is clamped to [1, len(model.model.layers)] so callers can't accidentally request more layers than the model has, or zero layers (which would freeze everything).

Test plan

  • New test tests/test_mlx_finetune_last_n_layers.py covering:
    • parameter exists with default None
    • None -> num_layers = total
    • explicit value -> num_layers = value
    • value > total -> clamped to total
    • value <= 0 -> clamped to 1
  • Empirical MLX smoke (already validated on danielhanchen/unsloth-staging-2 MLX parity probe matrix — probe 31 with num_layers=16 hits 10/15 = 67% matching mlx-lm CLI per-seed).

mlx-lm's lora CLI defaults CONFIG_DEFAULTS['num_layers']=16
(mlx_lm/lora.py:56) which trains LoRA only on the last 16
transformer blocks. unsloth-zoo's FastMLXModel.get_peft_model
applies LoRA to ALL transformer layers (matching HF/PEFT/CUDA
semantics on the GPU path).

On small models the difference shows up as a basin-selection
divergence -- the extra LoRA modules consume mx.random state
during init AND change the trainable-parameter set, so two
otherwise-identical runs land in different basins of attraction.
Empirical, n=15 seeds, gemma-3-270m-it single-row LoRA
memorization fixture: mlx-lm CLI's last-16 hits 67%, training
all 18 layers hits 47%. The teacher-forced completion loss is
0 in both, so memorization succeeds either way -- the gap is
purely on greedy-decode first-token argmax.

This commit adds an opt-in `finetune_last_n_layers` parameter
(default None = all layers, current behavior unchanged). Pass
`finetune_last_n_layers=16` to mirror mlx-lm CLI exactly. Wired
into both the VLM and text-only code paths in get_peft_model.

The bound is clamped to [1, len(model.model.layers)] so callers
can't accidentally request more layers than the model has, or
zero layers (which would freeze everything).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the finetune_last_n_layers parameter to FastMLXModel.get_peft_model, enabling users to restrict LoRA application to the last N transformer blocks. This change aligns the library's behavior with mlx-lm CLI defaults while maintaining the existing all-layers default for backward compatibility. Accompanying tests verify the parameter's functionality and edge-case handling. The review feedback highlights a potential issue where the requested layer count might be ignored if the model's total layer count isn't detected; suggestions were provided to ensure the user's intent is honored in such cases.

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines +2828 to +2829
if finetune_last_n_layers is not None and num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic skips updating num_layers if the total layer count detection fails (num_layers == 0). In such cases, num_layers remains 0, which mlx_lm.tuner.utils.linear_to_lora_layers interprets as applying LoRA to all layers. If a user explicitly requested a specific number of layers via finetune_last_n_layers, falling back to all layers is likely unexpected.

It is better to honor the user's request even if the total count is unknown, as mlx-lm's internal slicing (layers[-num_layers:]) is safe in Python even if the requested number exceeds the actual list length.

Suggested change
if finetune_last_n_layers is not None and num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))
if finetune_last_n_layers is not None:
requested = int(finetune_last_n_layers)
num_layers = max(1, min(requested, num_layers) if num_layers > 0 else requested)

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines +2915 to +2916
if finetune_last_n_layers is not None and num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the VLM path, the num_layers > 0 check prevents finetune_last_n_layers from being applied if the model's layer count wasn't successfully detected. This results in a fallback to 'all layers' (since num_layers remains 0), which contradicts the user's intent to limit the fine-tuning scope.

Updating num_layers to the requested value regardless of detection success ensures that mlx-lm attempts to slice the layers as requested.

Suggested change
if finetune_last_n_layers is not None and num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))
if finetune_last_n_layers is not None:
requested = int(finetune_last_n_layers)
num_layers = max(1, min(requested, num_layers) if num_layers > 0 else requested)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b137b4058e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines +2915 to +2916
if finetune_last_n_layers is not None and num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist the selected MLX LoRA layer count

When finetune_last_n_layers is used, the clamped value is only kept in this local num_layers. The trainer still saves adapter_config.json with "num_layers": len(_get_transformer_layers(self.model)) in unsloth_zoo/mlx/trainer.py:1391-1396, so an adapter trained with e.g. the last 16 layers of an 18-layer model is advertised as covering all 18 layers to mlx-lm's load_adapters. In mlx-lm-compatible reload/resume paths this recreates extra active/trainable LoRA modules that were not part of the training run, so the saved adapter no longer faithfully represents the selected layer scope; please record the selected count on the model or infer it when saving.

Useful? React with 👍 / 👎.

# mlx_lm.tuner.utils is imported inside the function:
fake_mod = type(sys)("mlx_lm.tuner.utils")
fake_mod.linear_to_lora_layers = fake_linear_to_lora_layers
sys.modules["mlx_lm.tuner.utils"] = fake_mod

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore the fake mlx-lm tuner module after the test

This assigns a synthetic mlx_lm.tuner.utils module directly into sys.modules and never restores the original stub/module. In any pytest run where later tests or code under test import other tuner utilities such as load_adapters, they will receive this fake module that only defines linear_to_lora_layers, making the outcome order-dependent. Use pytest's monkeypatch.setitem(sys.modules, ...) or save and restore the previous value.

Useful? React with 👍 / 👎.

what we assert, not the side effects on a real architecture).
"""
import sys
import unsloth_zoo.mlx.loader as loader_mod

# Stub out the helpers get_peft_model uses internally so the test
# doesn't need to walk a real model tree.
import unsloth_zoo.mlx.loader as L
@danielhanchen

Copy link
Copy Markdown
Member Author

Empirical bisection (gemma-3-270m-it, single-row LoRA memorization, n=15 seeds)

Filing the latest probe numbers here so reviewers can see the full picture this PR addresses vs what it does not.

probe loader trainer layers dtype greedy pass cf_loss=0
20 mlx-lm CLI (subprocess) mlx-lm last 16 bf16 67% 15/15
31 mlx_lm.load manual loop last 16 bf16 67% 15/15
30 mlx_lm.load manual loop all 18 bf16 47% 15/15
33 mlx_lm.load MLXTrainer last 16 bf16 53% 15/15
34 FastMLXModel(dtype=None) MLXTrainer last 16 bf16 47% 15/15
32 FastMLXModel(dtype="float16") MLXTrainer last 16 fp16 15% 15/15

The finetune_last_n_layers=16 knob is necessary but not sufficient to close the gap end-to-end through the full zoo stack. Three independent factors stack on top of the layer-selection issue:

  1. Layer selection (this PR): mlx_lm.load + manual loop swings 47% -> 67% just by switching from all-18 to last-16 layers (probe 30 vs probe 31). This is what finetune_last_n_layers exposes.
  2. MLXTrainer vs manual loop (-14pp): same loader, same layer count, different driver. Likely the extra mx.eval for monitoring + state ordering. Tracked separately.
  3. FastMLXModel loader patches (-6 to -10pp): _fix_missing_no_grad, freeze ordering, etc. Tracked separately.
  4. bf16 -> fp16 dtype cast (-28pp on Gemma3): _convert_mlx_dtype casts gemma3-270m's native bf16 storage to fp16 when dtype="float16" is passed. fp16 max ~6.5e4 vs bf16 max ~3.4e38; quietly lossy for Gemma3. Tracked separately.

Why this PR is still the right first step: the layer-selection mismatch is the only one of the four that is also a user-facing semantics difference, not just a numerical/perf overhead. mlx-lm CLI users who switch to FastMLXModel.get_peft_model() were previously silently training a different parameter set; the new knob makes that choice explicit and matches mlx-lm CLI when set.

cf_loss safety net: every config above hits teacher-forced completion loss == 0 in 15/15 seeds. The model memorizes either way; only the first-token greedy argmax distribution differs. CI smoke gating on completion_teacher_forced_loss < 0.5 (per unslothai/unsloth#5537) is bulletproof regardless of which basin a given seed lands in.

Will file separate issues / PRs for (2), (3), (4).

@danielhanchen

Copy link
Copy Markdown
Member Author

Final empirical summary (Round BO, 75 cells)

Five companion PRs landed in this MLX-vs-mlx-lm-CLI parity series:

  • #669 (this PR) $\to$ finetune_last_n_layers knob on FastMLXModel.get_peft_model.
  • unslothai/unsloth#5564 $\to$ same knob on the CUDA path.
  • #670 $\to$ warn on bf16$\to$fp16 downcast (Gemma3 silent precision loss).
  • #671 $\to$ max_grad_value=None honors disable + default to None for HF/TRL parity (closes #662).
  • #672 $\to$ _create_labeled_batches padding matches mlx-lm's iterate_batches.

Across Rounds BG-BO on danielhanchen/unsloth-staging-2 (15 paired seeds on unsloth/gemma-3-270m-it single-row LoRA memorization fixture), teacher-forced completion loss is 0 in 15/15 seeds for every config tested. So the model memorizes the training row in every case. The greedy-decode-on-prompt pass rate varies between 40% and 67% across configs, but that variation tracks first-token argmax tie-breaking, not training quality. PR unslothai/unsloth#5537's cf_loss < 0.5 gate is the right CI gate and is bulletproof regardless.

What this PR specifically addresses: the most user-visible parity surface, where calling FastMLXModel.get_peft_model() previously applied LoRA to all transformer blocks, while mlx_lm.lora CLI's CONFIG_DEFAULTS['num_layers']=16 applies LoRA to the last 16 blocks only. After this PR users can pass finetune_last_n_layers=16 to mirror mlx-lm CLI semantics on the zoo path, with the CUDA companion in unslothai/unsloth#5564 doing the same via PEFT's layers_to_transform.

Per code-comment policy: parameter name is self-documenting and the
clamp is obvious from max(1, min(...)). Rationale lives in the commit
message of b137b40 and the PR description.

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The goal of this PR is to give MLX callers a one-knob way to match mlx-lm CLI's CONFIG_DEFAULTS['num_layers']=16 semantics (LoRA on the last N transformer blocks). As a summary, this PR adds an optional finetune_last_n_layers keyword to FastMLXModel.get_peft_model and, when set, clamps it via max(1, min(int(N), total)) and passes it as num_layers to linear_to_lora_layers on both the VLM language path and the text path. Default None preserves the current all-layers behavior.

Two independent Opus reviewers were run in parallel on this PR.

Reviewers Severity Finding
2/2 Med get_peft_model docstring is not updated, so the new parameter is invisible to help(...) and inspect-style discovery.
2/2 Med int(finetune_last_n_layers) silently accepts True/False, floats (1.7 -> 1), and numeric strings, masking common user typos.
2/2 Med Tests skip the VLM branch entirely — only the text-only call site (line 2905) is exercised.
1/2 Med When the model lacks .model.layers (so num_layers == 0), finetune_last_n_layers is silently dropped with no warning.
2/2 Nit The clamp logic is duplicated verbatim between the VLM and text branches; extract a _resolve_num_layers(num_layers, finetune_last_n_layers) helper.
2/2 Nit The test monkeypatches sys.modules['mlx_lm.tuner.utils'] (and L._fix_missing_no_grad etc.) without a teardown — leaks into later tests if execution order shifts. Prefer pytest's monkeypatch fixture.

Overall: APPROVE_WITH_NITS.

See inline comments for details and suggested fixes.

Comment thread unsloth_zoo/mlx/loader.py
finetune_language_layers=True,
finetune_attention_modules=True,
finetune_mlp_modules=True,
finetune_last_n_layers=None,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/2 reviewers] Med: the new parameter is added to the signature but not documented anywhere users can find it. The get_peft_model docstring just above this hunk does not mention finetune_last_n_layers, its meaning ("last N transformer blocks"), the clamp range, or the mlx-lm CLI parity intent.

Suggested change
finetune_last_n_layers=None,
finetune_last_n_layers=None, # mlx-lm CLI parity: LoRA on last N transformer blocks; default None = all layers

Comment thread unsloth_zoo/mlx/loader.py
if hasattr(lm, "model") and hasattr(lm.model, "layers"):
num_layers = len(lm.model.layers)
if finetune_last_n_layers is not None and num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/2 reviewers] Med: int(finetune_last_n_layers) happily accepts True (→1), False (→0), floats (1.7 → 1, truncating silently), and numeric strings — all common typos. A True literal silently becomes "last 1 layer" instead of raising. Add an explicit type/range guard with a clear message; consider extracting the whole clamp into a helper since the same lines appear at line 2906 too.

Suggested change
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))
if finetune_last_n_layers is not None and num_layers > 0:
if not isinstance(finetune_last_n_layers, int) or isinstance(finetune_last_n_layers, bool):
raise TypeError(
f"finetune_last_n_layers must be an int, got {type(finetune_last_n_layers).__name__}"
)
num_layers = max(1, min(finetune_last_n_layers, num_layers))

Comment thread unsloth_zoo/mlx/loader.py
num_layers = 0
if hasattr(lm, "model") and hasattr(lm.model, "layers"):
num_layers = len(lm.model.layers)
if finetune_last_n_layers is not None and num_layers > 0:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1/2 reviewers] Med: when hasattr(lm, 'model') is False or lm.model.layers is empty, num_layers stays at 0 and the num_layers > 0 guard silently drops the user's finetune_last_n_layers request. linear_to_lora_layers is then called with num_layers=0, which is a no-op. Surface this rather than silently ignoring the setting.

Suggested change
if finetune_last_n_layers is not None and num_layers > 0:
if finetune_last_n_layers is not None:
if num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))
else:
import warnings
warnings.warn(
"Unsloth: finetune_last_n_layers requested but the model does not expose .model.layers; ignoring.",
stacklevel=2,
)

Comment thread unsloth_zoo/mlx/loader.py
if hasattr(model, "model") and hasattr(model.model, "layers"):
num_layers = len(model.model.layers)
if finetune_last_n_layers is not None and num_layers > 0:
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/2 reviewers] Nit: this 2-line clamp is identical to the VLM branch above (line 2823-2824). Extract a tiny helper so the two paths cannot drift:

Suggested change
num_layers = max(1, min(int(finetune_last_n_layers), num_layers))
num_layers = _resolve_finetune_last_n_layers(num_layers, finetune_last_n_layers)

(with a free function defined once near the top of FastMLXModel:

def _resolve_finetune_last_n_layers(num_layers, n):
    if n is None or num_layers <= 0:
        return num_layers
    return max(1, min(int(n), num_layers))
```)

class FakeLayer: pass
class FakeInner:
layers = [FakeLayer() for _ in range(8)]
class FakeModel:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/2 reviewers] Med: the FakeModel here sets _is_vlm_model = False, so only the text-only call site (loader.py:2905) is exercised. The new clamp on line 2823 (VLM language path) has zero coverage even though the PR description says "wired into both VLM and text-only code paths". Add a second fixture that flips _is_vlm_model = True with a stubbed language_model so both call sites are pinned.

Suggested change
class FakeModel:
class FakeModel:
model = FakeInner()
_unsloth_full_finetuning = False
_is_vlm_model = False
def freeze(self): pass
def unfreeze(self, **kwargs): pass
def trainable_parameters(self): return {}
def parameters(self): return {}
class FakeVLMInner:
layers = [FakeLayer() for _ in range(8)]
class FakeVLM:
language_model = type("LM", (), {"model": FakeVLMInner()})
_unsloth_full_finetuning = False
_is_vlm_model = True
def freeze(self): pass
def unfreeze(self, **kwargs): pass
def trainable_parameters(self): return {}
def parameters(self): return {}

# Case 1: default (None) -> all 8 layers
captured["calls"].clear()
loader_mod.FastMLXModel.get_peft_model(
FakeModel(),

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/2 reviewers] Nit: this monkeypatches sys.modules['mlx_lm.tuner.utils'] and four L.* attributes with no teardown. If pytest collection order changes, downstream tests in the same session will see the stubs instead of the real module. Use the monkeypatch fixture so changes are reverted automatically.

Suggested change
FakeModel(),
def test_get_peft_model_passes_finetune_last_n_layers_through(monkeypatch):
import sys
import unsloth_zoo.mlx.loader as loader_mod
# ... build FakeModel ...
captured = {"calls": []}
def fake_linear_to_lora_layers(model, num_layers, config, use_dora=False):
captured["calls"].append(num_layers)
monkeypatch.setattr(loader_mod, "_fix_missing_no_grad", lambda m: None)
monkeypatch.setattr(loader_mod, "_resolve_lora_keys", lambda m, t: [
"model.layers.0.self_attn.q_proj",
"model.layers.1.mlp.gate_proj",
])
monkeypatch.setattr(loader_mod, "_apply_mlx_lora_initialization", lambda m, init: None)
monkeypatch.setattr(loader_mod, "linear_to_lora_layers", fake_linear_to_lora_layers)
fake_mod = type(sys)("mlx_lm.tuner.utils")
fake_mod.linear_to_lora_layers = fake_linear_to_lora_layers
monkeypatch.setitem(sys.modules, "mlx_lm.tuner.utils", fake_mod)

@danielhanchen danielhanchen merged commit 049a14d into main May 19, 2026
15 checks passed
Sekinal pushed a commit to Sekinal/unsloth-zoo that referenced this pull request May 19, 2026
…ai#678)

Re-merge of PR unslothai#674 — the original was accidentally merged into the
stale fix-mlx-num-layers-parity branch (after unslothai#669 had already squashed
into main), leaving this fix stranded.

FastMLXModel.get_peft_model previously called
`_seed_mlx_random_state(random_state)` near the top of the method,
~100+ source lines above the actual `linear_to_lora_layers` call.
In between sit target-module normalization, `_fix_missing_no_grad`,
`_resolve_lora_keys`, and (on the VLM branch) the model-tree walk.

Empirically this leaves a window in which lazy MLX state mutations or
implicit `mx.random` consumption can slip in, so the lora_a matrices
initialized inside `linear_to_lora_layers` end up DIFFERENT from
mlx-lm CLI's, which seeds at `mlx_lm/tuner/lora.py` (def train)
immediately before `linear_to_lora_layers`.

Verified by probe 39 on danielhanchen/unsloth-staging-2:
  seed=    1, 42, 999, 3407, 22222: max |dloss| = max |dgrad_norm| = 0.0
across all 30 steps × 5 seeds (vs non-zero deltas before).

This fix moves `_seed_mlx_random_state(random_state)` to immediately
before each `linear_to_lora_layers(...)` call -- both VLM language
path and text path. API surface unchanged.

Test `tests/test_mlx_get_peft_model_seed_ordering.py` pins:
  1. Every linear_to_lora_layers call inside get_peft_model is
     preceded by `_seed_mlx_random_state` within 20 lines.
  2. The `random_state` API parameter still exists with default 3407.
  3. The tight pairing matches BOTH the VLM and text LoRA call sites
     (regex tripwire that only allows comment lines between).
Erland366 pushed a commit to Erland366/unsloth-zoo that referenced this pull request Jun 10, 2026
…nslothai#739)

test_get_peft_model_passes_finetune_last_n_layers_through has failed
since it was introduced in unslothai#669: the trainable parameter summary that
get_peft_model prints (added in unslothai#634) calls model.trainable_parameters()
and model.parameters(), which the synthetic FakeModel never stubbed.
CI never executed the test body (collect-only plus exclusion list), so
the failure stayed hidden. Give the fixture the two methods returning
empty trees, matching the fixtures in test_mlx_save_lora_adapters_filter,
so the summary computes 0 of 0 params and the num_layers assertions are
exercised as intended.
danielhanchen added a commit that referenced this pull request Jun 11, 2026
…gate (#755)

test_mlx_finetune_last_n_layers was born broken in #669 and stayed
invisible until #739 because no CI job executed it: the version matrix
only collects, the macOS MLX job runs the shim smoke test alone, and
the zoo-specific CPU list does not include it. Add a small hard-gate
step in repo-tests-cpu running it together with
test_training_utils_use_cache (the use_cache disable/restore contract
from #715). Both files are CPU-pure and run in under a second, and the
job already installs the deps they need.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant