Frequent CPU  OOM / Process Killed After Several Hundred Steps

1. Did you update? `pip install --upgrade unsloth unsloth_zoo`:Yes 2026.2.1
3. `Colab` or `Kaggle` or local / cloud: Cloud
4. Number GPUs used, use `nvidia-smi`:1 * RTX PRO 6000 96GB
6. Which notebook? Please link!
7. Which Unsloth version, TRL version, transformers version, PyTorch version?
hf_transfer                       0.1.9
torch                             2.9.0
torchao                           0.16.0.dev20260123+cu128
torchaudio                        2.9.0
torchvision                       0.24.0
transformers                      4.57.6
unsloth                           2026.2.1
unsloth_zoo                       2026.2.1
vllm                              0.13.0
9. Which trainer? `SFTTrainer`, `GRPOTrainer` etc
GRPOTrainer


Current Situation

I am currently training a GRPO model. Below is my training script configuration:

```python
def main():
    gc.collect()
    torch.cuda.empty_cache()

    print("=== GRPO Training (HTTP RM Mode + Local Guardrails) ===")

    # 1. Load model
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = MODEL_NAME,
        max_seq_length = MAX_SEQ_LENGTH,
        load_in_4bit = False,
        fast_inference = True,
        dtype=torch.float16,
    )
    tokenizer.padding_side = "left"  # Left padding for generation
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    # 2. Add LoRA
    model = FastLanguageModel.get_peft_model(
        model,
        r = 16,
        target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
        lora_alpha = 16,
        use_gradient_checkpointing = "unsloth",
        random_state = 3407
    )

    # 3. Dataset
    dataset = prepare_dataset(TRAIN_DATA_PATH, tokenizer)
    train_dataset = dataset

    # 4. vLLM sampling parameters
    vllm_sampling_params = SamplingParams(
        temperature = 1.1,
        top_p = 0.7,
        repetition_penalty = 1.05,
        max_tokens = MAX_COMPLETION_LENGTH,
        stop = [tokenizer.eos_token, "<|im_end|>", "<|endoftext|>"]
    )

    # 5. Trainer configuration
    training_args = GRPOConfig(
        output_dir = OUTPUT_DIR,
        learning_rate = LEARNING_RATE,
        per_device_train_batch_size = PER_DEVICE_BATCH_SIZE,
        num_generations = NUM_GENERATIONS,
        gradient_accumulation_steps = GRADIENT_ACCUMULATION,
        max_prompt_length = MAX_PROMPT_LENGTH,
        max_completion_length = MAX_COMPLETION_LENGTH,
        warmup_steps = WARMUP_STEPS,
        unsloth_grpo_mini_batch = 16,
        unsloth_logit_chunk_multiplier = 4,
        warmup_ratio = 0.0,
        weight_decay = 0.01,
        num_train_epochs = 1,
        save_steps = 25,
        logging_steps = 1,
        max_grad_norm = 0.1,
        bf16 = False,
        fp16 = True,
        optim = "adamw_8bit",
        seed = 42,
        report_to = "none",
        use_vllm = True,
        scale_rewards="group",
        vllm_sampling_params = vllm_sampling_params,
        vllm_gpu_memory_utilization = 0.85,
        beta = 0.01,
        loss_type = "dr_grpo",
        importance_sampling_level = "sequence",
        epsilon = 3e-4,
        delta = None,
        epsilon_high = 4e-4,
    )

    # 6. Initialize Reward Function
    reward_func = create_reward_fn(
        model=model,
        tokenizer=tokenizer,
        training_state=training_state_tracker
    )

    # 7. Initialize Trainer
    trainer = GRPOTrainer(
        model = model,
        processing_class = tokenizer,
        reward_funcs = [reward_func], 
        args = training_args,
        train_dataset = train_dataset,
        callbacks = [GlobalStepUpdater(training_state_tracker)], 
        generation_kwargs = dict(
           temperature = 1.1,
           top_p = 0.7,
           repetition_penalty = 1.05,
           max_new_tokens = MAX_COMPLETION_LENGTH,
           stop = ["<|im_end|>", "<|endoftext|>", tokenizer.eos_token],
           stop_token_ids = [151643, 151645], 
       ),
    )

    # 8. Start training
    try:
        if RESUME_FROM_CHECKPOINT and os.path.exists(RESUME_FROM_CHECKPOINT):
            print(f"Resuming from checkpoint: {RESUME_FROM_CHECKPOINT}")
            trainer.train(resume_from_checkpoint=RESUME_FROM_CHECKPOINT)
        else:
            trainer.train()
    except Exception as e:
        print(f"Training error: {e}")
        raise e

    print(f"Saving model to {OUTPUT_DIR}...")
    model.save_pretrained_merged(OUTPUT_DIR, tokenizer, save_method="lora")
    print("Training finished.")
```
After several hundred training steps:
 Sometimes after 100+ steps
 Sometimes after 200+ steps
The training process is killed due to insufficient cpu memory.

This behavior is:
Highly reproducible
Happens very frequently
Almost guaranteed to occur after enough steps

The process is killed by the system  due to insufficient cpu memory.

Attached are GPU and CPU resource utilization screenshots.
GPU utilization:

<img width="1445" height="455" alt="Image" src="https://github.com/user-attachments/assets/89becb19-75b7-4a78-806b-9de88f25e09e" />

CPU utilization:

<img width="1436" height="279" alt="Image" src="https://github.com/user-attachments/assets/8d3f21d6-1332-4291-9cf4-67a238125faf" />

Additional Explanation:Under normal conditions, CPU memory usage stays relatively stable at around **40GB+**.
However, when the issue occurs (as shown in the red circle in the screenshot), CPU memory usage spikes to around **90GB**.At that point, the system runs out of available memory and **kills the training process (OOM Killer)**.



🦥 You can also ask via our Reddit page: https://reddit.com/r/unsloth/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Frequent CPU OOM / Process Killed After Several Hundred Steps #4122

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Frequent CPU OOM / Process Killed After Several Hundred Steps #4122

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions