LoRA training crashes on first backward pass with Qwen3.5-9B on M5 Max (applegpu_g17s)

Environment:

macOS, Apple M5 Max (applegpu_g17s), 36GB unified memory
mlx 0.31.2, mlx-lm 0.31.3, mlx-metal 0.31.2
Model: mlx-community/Qwen3.5-9B-4bit
What happens:
Validation (forward pass) completes successfully. The first training iteration crashes immediately with:


[METAL] Command buffer execution failed: Insufficient Memory
(00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
This happens regardless of batch size (1 or 2), sequence length (2048–8192), number of LoRA layers (4–16), or whether --grad-checkpoint is set. Total system memory usage is low at the time of crash.

Workaround:
Switching to mlx-community/Qwen3-8B-4bit (same architecture family, previous generation) trains successfully with identical settings. Suggests the issue is specific to Qwen3.5's architecture changes in this mlx version.

Reproduce:


mlx_lm lora \
  --model mlx-community/Qwen3.5-9B-4bit \
  --train --data data/ \
  --batch-size 1 --num-layers 4 \
  --max-seq-length 2048 \
  --grad-checkpoint --val-batches 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA training crashes on first backward pass with Qwen3.5-9B on M5 Max (applegpu_g17s) #1206

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

LoRA training crashes on first backward pass with Qwen3.5-9B on M5 Max (applegpu_g17s) #1206

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions