Skip to content

LoRA training crashes on first backward pass with Qwen3.5-9B on M5 Max (applegpu_g17s) #1206

@sivavenkatay

Description

@sivavenkatay

Environment:

macOS, Apple M5 Max (applegpu_g17s), 36GB unified memory
mlx 0.31.2, mlx-lm 0.31.3, mlx-metal 0.31.2
Model: mlx-community/Qwen3.5-9B-4bit
What happens:
Validation (forward pass) completes successfully. The first training iteration crashes immediately with:

[METAL] Command buffer execution failed: Insufficient Memory
(00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
This happens regardless of batch size (1 or 2), sequence length (2048–8192), number of LoRA layers (4–16), or whether --grad-checkpoint is set. Total system memory usage is low at the time of crash.

Workaround:
Switching to mlx-community/Qwen3-8B-4bit (same architecture family, previous generation) trains successfully with identical settings. Suggests the issue is specific to Qwen3.5's architecture changes in this mlx version.

Reproduce:

mlx_lm lora
--model mlx-community/Qwen3.5-9B-4bit
--train --data data/
--batch-size 1 --num-layers 4
--max-seq-length 2048
--grad-checkpoint --val-batches 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions