Skip to content

Frequent CPU OOM / Process Killed After Several Hundred Steps #4122

@camposs1979

Description

@camposs1979
  1. Did you update? pip install --upgrade unsloth unsloth_zoo:Yes 2026.2.1
  2. Colab or Kaggle or local / cloud: Cloud
  3. Number GPUs used, use nvidia-smi:1 * RTX PRO 6000 96GB
  4. Which notebook? Please link!
  5. Which Unsloth version, TRL version, transformers version, PyTorch version?
    hf_transfer 0.1.9
    torch 2.9.0
    torchao 0.16.0.dev20260123+cu128
    torchaudio 2.9.0
    torchvision 0.24.0
    transformers 4.57.6
    unsloth 2026.2.1
    unsloth_zoo 2026.2.1
    vllm 0.13.0
  6. Which trainer? SFTTrainer, GRPOTrainer etc
    GRPOTrainer

Current Situation

I am currently training a GRPO model. Below is my training script configuration:

def main():
    gc.collect()
    torch.cuda.empty_cache()

    print("=== GRPO Training (HTTP RM Mode + Local Guardrails) ===")

    # 1. Load model
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = MODEL_NAME,
        max_seq_length = MAX_SEQ_LENGTH,
        load_in_4bit = False,
        fast_inference = True,
        dtype=torch.float16,
    )
    tokenizer.padding_side = "left"  # Left padding for generation
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    # 2. Add LoRA
    model = FastLanguageModel.get_peft_model(
        model,
        r = 16,
        target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
        lora_alpha = 16,
        use_gradient_checkpointing = "unsloth",
        random_state = 3407
    )

    # 3. Dataset
    dataset = prepare_dataset(TRAIN_DATA_PATH, tokenizer)
    train_dataset = dataset

    # 4. vLLM sampling parameters
    vllm_sampling_params = SamplingParams(
        temperature = 1.1,
        top_p = 0.7,
        repetition_penalty = 1.05,
        max_tokens = MAX_COMPLETION_LENGTH,
        stop = [tokenizer.eos_token, "<|im_end|>", "<|endoftext|>"]
    )

    # 5. Trainer configuration
    training_args = GRPOConfig(
        output_dir = OUTPUT_DIR,
        learning_rate = LEARNING_RATE,
        per_device_train_batch_size = PER_DEVICE_BATCH_SIZE,
        num_generations = NUM_GENERATIONS,
        gradient_accumulation_steps = GRADIENT_ACCUMULATION,
        max_prompt_length = MAX_PROMPT_LENGTH,
        max_completion_length = MAX_COMPLETION_LENGTH,
        warmup_steps = WARMUP_STEPS,
        unsloth_grpo_mini_batch = 16,
        unsloth_logit_chunk_multiplier = 4,
        warmup_ratio = 0.0,
        weight_decay = 0.01,
        num_train_epochs = 1,
        save_steps = 25,
        logging_steps = 1,
        max_grad_norm = 0.1,
        bf16 = False,
        fp16 = True,
        optim = "adamw_8bit",
        seed = 42,
        report_to = "none",
        use_vllm = True,
        scale_rewards="group",
        vllm_sampling_params = vllm_sampling_params,
        vllm_gpu_memory_utilization = 0.85,
        beta = 0.01,
        loss_type = "dr_grpo",
        importance_sampling_level = "sequence",
        epsilon = 3e-4,
        delta = None,
        epsilon_high = 4e-4,
    )

    # 6. Initialize Reward Function
    reward_func = create_reward_fn(
        model=model,
        tokenizer=tokenizer,
        training_state=training_state_tracker
    )

    # 7. Initialize Trainer
    trainer = GRPOTrainer(
        model = model,
        processing_class = tokenizer,
        reward_funcs = [reward_func], 
        args = training_args,
        train_dataset = train_dataset,
        callbacks = [GlobalStepUpdater(training_state_tracker)], 
        generation_kwargs = dict(
           temperature = 1.1,
           top_p = 0.7,
           repetition_penalty = 1.05,
           max_new_tokens = MAX_COMPLETION_LENGTH,
           stop = ["<|im_end|>", "<|endoftext|>", tokenizer.eos_token],
           stop_token_ids = [151643, 151645], 
       ),
    )

    # 8. Start training
    try:
        if RESUME_FROM_CHECKPOINT and os.path.exists(RESUME_FROM_CHECKPOINT):
            print(f"Resuming from checkpoint: {RESUME_FROM_CHECKPOINT}")
            trainer.train(resume_from_checkpoint=RESUME_FROM_CHECKPOINT)
        else:
            trainer.train()
    except Exception as e:
        print(f"Training error: {e}")
        raise e

    print(f"Saving model to {OUTPUT_DIR}...")
    model.save_pretrained_merged(OUTPUT_DIR, tokenizer, save_method="lora")
    print("Training finished.")

After several hundred training steps:
Sometimes after 100+ steps
Sometimes after 200+ steps
The training process is killed due to insufficient cpu memory.

This behavior is:
Highly reproducible
Happens very frequently
Almost guaranteed to occur after enough steps

The process is killed by the system due to insufficient cpu memory.

Attached are GPU and CPU resource utilization screenshots.
GPU utilization:

Image

CPU utilization:

Image

Additional Explanation:Under normal conditions, CPU memory usage stays relatively stable at around 40GB+.
However, when the issue occurs (as shown in the red circle in the screenshot), CPU memory usage spikes to around 90GB.At that point, the system runs out of available memory and kills the training process (OOM Killer).

🦥 You can also ask via our Reddit page: https://reddit.com/r/unsloth/

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions