Add CPOTrainer by fe1ixxu · Pull Request #1382 · huggingface/trl

fe1ixxu · 2024-02-29T14:49:26Z

Hi! This PR wants to add CPOTrainer proposed in the paper Contrastive Preference Optimization: Pushing the Boundaries of LLM
Performance in Machine Translation

The CPO method is one of the algorithm for building the state-of-the-art LLM-based translation model: ALMA

fe1ixxu · 2024-02-29T14:57:28Z

cc @kashif @lewtun

kashif · 2024-02-29T15:18:05Z

@fe1ixxu how close is the trainer in terms of code to the DPOTrainer? Can one subclass from it?

fe1ixxu · 2024-02-29T16:03:32Z

@kashif Thanks for the quick response! CPO is an approximation of DPO. The key differences between CPOTrainer and DPOTrainer are:

Remove the need of reference model
add an extra NLL loss for preferred data

I'm uncertain whether subclassing CPOTrainer from DPOTrainer is a proper idea, as DPOTrainer introduces numerous features related to reference models that are unnecessary for CPOTrainer.

HuggingFaceDocBuilderDev · 2024-03-05T19:59:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fe1ixxu · 2024-03-07T12:48:57Z

Hi @kashif CPO docs has been finished now! Thanks!

fe1ixxu · 2024-03-21T13:51:52Z

Hi @kashif and @lewtun, I see implementation of CPOTrainer has been finished for a while and it has passed all checks. If this looks good to you, is there any chance to merge it to the main branch? Thanks!

lewtun

Thank you for this very nice implementation of CPO @fe1ixxu 🔥 ! I left a few small comments and a suggestion to remove a deepspeed function I don't think we need. Apart from that LGTM!

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Because CPO does not need init for reference model

* add CPOTrainer * add docs * fix formatting * removed precompute_ref_log_probs arg * remove precompute_ref_log_probs * typos * finish cpo trainer doc * remove redundant lines * typo * formatting * compute chosen nll loss also for enc-dec models * fix gradient error of inplace operation for enc-dec models * formatting * use CPOConfig * formatting * use model_init_kwargs from CPOConfig * comments in example * fix doc string * fix typo in docstring * update year * fixed typo * use preference dataset * fix learning rate * move dataset_num_proc to configs * Update cpo paper link from HF: cpo_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * update description for CPO: cpo_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * remove _prepare_deepspeed for cpo Because CPO does not need init for reference model * Add explanation to CPO loss * format * fix bug when lengths are given * add CPOTrainer to README * fix grammer --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

kashif force-pushed the cpo-trainer branch from d17049f to 55e1b90 Compare March 15, 2024 08:43

fe1ixxu and others added 10 commits March 15, 2024 10:09

add CPOTrainer

9d45ce3

add docs

ed1e7c6

fix formatting

0e76df1

removed precompute_ref_log_probs arg

8c95468

remove precompute_ref_log_probs

551a0df

typos

4850ac9

finish cpo trainer doc

7c4e6b4

remove redundant lines

ba760de

typo

44256a5

formatting

f4d07cb

kashif force-pushed the cpo-trainer branch from 55e1b90 to f4d07cb Compare March 15, 2024 09:09

fe1ixxu and others added 5 commits March 15, 2024 20:48

compute chosen nll loss also for enc-dec models

8ca08f8

fix gradient error of inplace operation for enc-dec models

4d2c292

formatting

abd4a45

use CPOConfig

2137ac9

formatting

606f294

kashif requested review from kashif and lewtun March 17, 2024 12:58

kashif approved these changes Mar 17, 2024

View reviewed changes

kashif added 5 commits March 17, 2024 14:15

use model_init_kwargs from CPOConfig

afd02c2

comments in example

a1836ac

fix doc string

da4a4ee

fix typo in docstring

ef1cd23

update year

2aabcf4

kashif added 6 commits March 17, 2024 16:43

Merge branch 'huggingface:main' into cpo-trainer

18975f1

fixed typo

bcb6cd7

Merge branch 'main' into cpo-trainer

27672c3

use preference dataset

39c8e61

fix learning rate

93591bd

move dataset_num_proc to configs

6b24fd1

lewtun approved these changes Mar 21, 2024

View reviewed changes

Comment thread docs/source/cpo_trainer.mdx Outdated

Comment thread docs/source/cpo_trainer.mdx Outdated

Comment thread trl/trainer/cpo_trainer.py Outdated

Comment thread trl/trainer/cpo_trainer.py

fe1ixxu and others added 9 commits March 21, 2024 11:14

Update cpo paper link from HF: cpo_trainer.mdx

6def5dd

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

update description for CPO: cpo_trainer.mdx

d9a48de

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

remove _prepare_deepspeed for cpo

b300473

Because CPO does not need init for reference model

Add explanation to CPO loss

4a0dcd0

format

25c5495

fix bug when lengths are given

328434e

Merge remote-tracking branch 'upstream/main' into cpo-trainer

dd9344a

add CPOTrainer to README

8c842be

fix grammer

2ff65bd

kashif merged commit d1df79f into huggingface:main Mar 22, 2024

kashif deleted the cpo-trainer branch March 22, 2024 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CPOTrainer#1382

Add CPOTrainer#1382
kashif merged 35 commits into
huggingface:mainfrom
fe1ixxu:cpo-trainer

fe1ixxu commented Feb 29, 2024 •

edited

Loading

Uh oh!

fe1ixxu commented Feb 29, 2024

Uh oh!

kashif commented Feb 29, 2024

Uh oh!

fe1ixxu commented Feb 29, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 5, 2024

Uh oh!

fe1ixxu commented Mar 7, 2024

Uh oh!

fe1ixxu commented Mar 21, 2024 •

edited

Loading

Uh oh!

lewtun left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fe1ixxu commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fe1ixxu commented Feb 29, 2024

Uh oh!

kashif commented Feb 29, 2024

Uh oh!

fe1ixxu commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 5, 2024

Uh oh!

fe1ixxu commented Mar 7, 2024

Uh oh!

fe1ixxu commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fe1ixxu commented Feb 29, 2024 •

edited

Loading

fe1ixxu commented Feb 29, 2024 •

edited

Loading

fe1ixxu commented Mar 21, 2024 •

edited

Loading