Adds Online DPO by edbeeching · Pull Request #1605 · huggingface/trl

edbeeching · 2024-04-30T19:46:00Z

WIP ignore for now

usage

accelerate launch --config_file deepspeed_zero3.yaml examples/scripts/dpo_online.py ----model_name_or_path=HuggingFaceH4/mistral-7b-ift --model_revision=v25.2 --output_dir=data/mistral-7b-odpo --dataset_name=HuggingFaceH4/ultrafeedback_binarized --dataset_train_split=train_gen --dataset_test_split=test_gen --gradient_accumulation_steps=1 --bf16=True --attn_implementation=flash_attention_2 --per_device_train_batch_size=2

…pped with deepspeed

olgavrou · 2024-05-02T15:02:13Z

This is cool, I was doing the same but by extending the training_step of the existing dpo trainer and generating the new pairs there before calling super().training_step. This looks like a more complete solution

github-actions · 2024-05-31T15:04:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

lewtun and others added 14 commits April 29, 2024 12:31

Add WinRateCallback

42e1170

Enable PairRM

52ae31e

Refactor

2065c3f

Streamline

efa2e21

port initial implementation

2dcd1a7

debugging

7f8916f

make it run

090b59c

testing multi gpu

cbdaa68

Add HF judge

27be35d

moved annotator to class init, cleanup

72f2b77

ensure the judge model is instantiated after the other models are wra…

7cc517c

…pped with deepspeed

Merge branch 'add-winrate-cb' into online-dpo

dc6e5a5

fix merge

bf07417

fix train_sft -> train_gen

3880a7a

lewtun mentioned this pull request May 3, 2024

Fix ZeRO-3 generation context manager #1617

Merged

github-actions Bot closed this Jun 8, 2024

lewtun reopened this Jun 25, 2024

github-actions Bot closed this Jul 4, 2024

qgallouedec deleted the online-dpo branch August 29, 2025 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Online DPO#1605

Adds Online DPO#1605
edbeeching wants to merge 14 commits into
mainfrom
online-dpo

edbeeching commented Apr 30, 2024 •

edited

Loading

Uh oh!

olgavrou commented May 2, 2024

Uh oh!

github-actions Bot commented May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edbeeching commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

olgavrou commented May 2, 2024

Uh oh!

github-actions Bot commented May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edbeeching commented Apr 30, 2024 •

edited

Loading