Skip to content

Adds Online DPO#1605

Closed
edbeeching wants to merge 14 commits into
mainfrom
online-dpo
Closed

Adds Online DPO#1605
edbeeching wants to merge 14 commits into
mainfrom
online-dpo

Conversation

@edbeeching

@edbeeching edbeeching commented Apr 30, 2024

Copy link
Copy Markdown
Collaborator

WIP ignore for now

usage

accelerate launch --config_file deepspeed_zero3.yaml examples/scripts/dpo_online.py ----model_name_or_path=HuggingFaceH4/mistral-7b-ift --model_revision=v25.2 --output_dir=data/mistral-7b-odpo --dataset_name=HuggingFaceH4/ultrafeedback_binarized --dataset_train_split=train_gen --dataset_test_split=test_gen --gradient_accumulation_steps=1 --bf16=True --attn_implementation=flash_attention_2 --per_device_train_batch_size=2

@olgavrou

olgavrou commented May 2, 2024

Copy link
Copy Markdown

This is cool, I was doing the same but by extending the training_step of the existing dpo trainer and generating the new pairs there before calling super().training_step. This looks like a more complete solution

@github-actions

Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions Bot closed this Jun 8, 2024
@lewtun lewtun reopened this Jun 25, 2024
@github-actions github-actions Bot closed this Jul 4, 2024
@qgallouedec qgallouedec deleted the online-dpo branch August 29, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants