- Did you update?
pip install --upgrade unsloth unsloth_zoo:Yes 2026.2.1
Colab or Kaggle or local / cloud: Cloud
- Number GPUs used, use
nvidia-smi:1 * RTX PRO 6000 96GB
- Which notebook? Please link!
- Which Unsloth version, TRL version, transformers version, PyTorch version?
hf_transfer 0.1.9
torch 2.9.0
torchao 0.16.0.dev20260123+cu128
torchaudio 2.9.0
torchvision 0.24.0
transformers 4.57.6
unsloth 2026.2.1
unsloth_zoo 2026.2.1
vllm 0.13.0
- Which trainer?
SFTTrainer, GRPOTrainer etc
GRPOTrainer
Current Situation
I am currently training a GRPO model. Below is my training script configuration:
def main():
gc.collect()
torch.cuda.empty_cache()
print("=== GRPO Training (HTTP RM Mode + Local Guardrails) ===")
# 1. Load model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = MODEL_NAME,
max_seq_length = MAX_SEQ_LENGTH,
load_in_4bit = False,
fast_inference = True,
dtype=torch.float16,
)
tokenizer.padding_side = "left" # Left padding for generation
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# 2. Add LoRA
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_alpha = 16,
use_gradient_checkpointing = "unsloth",
random_state = 3407
)
# 3. Dataset
dataset = prepare_dataset(TRAIN_DATA_PATH, tokenizer)
train_dataset = dataset
# 4. vLLM sampling parameters
vllm_sampling_params = SamplingParams(
temperature = 1.1,
top_p = 0.7,
repetition_penalty = 1.05,
max_tokens = MAX_COMPLETION_LENGTH,
stop = [tokenizer.eos_token, "<|im_end|>", "<|endoftext|>"]
)
# 5. Trainer configuration
training_args = GRPOConfig(
output_dir = OUTPUT_DIR,
learning_rate = LEARNING_RATE,
per_device_train_batch_size = PER_DEVICE_BATCH_SIZE,
num_generations = NUM_GENERATIONS,
gradient_accumulation_steps = GRADIENT_ACCUMULATION,
max_prompt_length = MAX_PROMPT_LENGTH,
max_completion_length = MAX_COMPLETION_LENGTH,
warmup_steps = WARMUP_STEPS,
unsloth_grpo_mini_batch = 16,
unsloth_logit_chunk_multiplier = 4,
warmup_ratio = 0.0,
weight_decay = 0.01,
num_train_epochs = 1,
save_steps = 25,
logging_steps = 1,
max_grad_norm = 0.1,
bf16 = False,
fp16 = True,
optim = "adamw_8bit",
seed = 42,
report_to = "none",
use_vllm = True,
scale_rewards="group",
vllm_sampling_params = vllm_sampling_params,
vllm_gpu_memory_utilization = 0.85,
beta = 0.01,
loss_type = "dr_grpo",
importance_sampling_level = "sequence",
epsilon = 3e-4,
delta = None,
epsilon_high = 4e-4,
)
# 6. Initialize Reward Function
reward_func = create_reward_fn(
model=model,
tokenizer=tokenizer,
training_state=training_state_tracker
)
# 7. Initialize Trainer
trainer = GRPOTrainer(
model = model,
processing_class = tokenizer,
reward_funcs = [reward_func],
args = training_args,
train_dataset = train_dataset,
callbacks = [GlobalStepUpdater(training_state_tracker)],
generation_kwargs = dict(
temperature = 1.1,
top_p = 0.7,
repetition_penalty = 1.05,
max_new_tokens = MAX_COMPLETION_LENGTH,
stop = ["<|im_end|>", "<|endoftext|>", tokenizer.eos_token],
stop_token_ids = [151643, 151645],
),
)
# 8. Start training
try:
if RESUME_FROM_CHECKPOINT and os.path.exists(RESUME_FROM_CHECKPOINT):
print(f"Resuming from checkpoint: {RESUME_FROM_CHECKPOINT}")
trainer.train(resume_from_checkpoint=RESUME_FROM_CHECKPOINT)
else:
trainer.train()
except Exception as e:
print(f"Training error: {e}")
raise e
print(f"Saving model to {OUTPUT_DIR}...")
model.save_pretrained_merged(OUTPUT_DIR, tokenizer, save_method="lora")
print("Training finished.")
After several hundred training steps:
Sometimes after 100+ steps
Sometimes after 200+ steps
The training process is killed due to insufficient cpu memory.
This behavior is:
Highly reproducible
Happens very frequently
Almost guaranteed to occur after enough steps
The process is killed by the system due to insufficient cpu memory.
Attached are GPU and CPU resource utilization screenshots.
GPU utilization:
CPU utilization:
Additional Explanation:Under normal conditions, CPU memory usage stays relatively stable at around 40GB+.
However, when the issue occurs (as shown in the red circle in the screenshot), CPU memory usage spikes to around 90GB.At that point, the system runs out of available memory and kills the training process (OOM Killer).
🦥 You can also ask via our Reddit page: https://reddit.com/r/unsloth/
pip install --upgrade unsloth unsloth_zoo:Yes 2026.2.1ColaborKaggleor local / cloud: Cloudnvidia-smi:1 * RTX PRO 6000 96GBhf_transfer 0.1.9
torch 2.9.0
torchao 0.16.0.dev20260123+cu128
torchaudio 2.9.0
torchvision 0.24.0
transformers 4.57.6
unsloth 2026.2.1
unsloth_zoo 2026.2.1
vllm 0.13.0
SFTTrainer,GRPOTraineretcGRPOTrainer
Current Situation
I am currently training a GRPO model. Below is my training script configuration:
After several hundred training steps:
Sometimes after 100+ steps
Sometimes after 200+ steps
The training process is killed due to insufficient cpu memory.
This behavior is:
Highly reproducible
Happens very frequently
Almost guaranteed to occur after enough steps
The process is killed by the system due to insufficient cpu memory.
Attached are GPU and CPU resource utilization screenshots.
GPU utilization:
CPU utilization:
Additional Explanation:Under normal conditions, CPU memory usage stays relatively stable at around 40GB+.
However, when the issue occurs (as shown in the red circle in the screenshot), CPU memory usage spikes to around 90GB.At that point, the system runs out of available memory and kills the training process (OOM Killer).
🦥 You can also ask via our Reddit page: https://reddit.com/r/unsloth/