Skip to content

[DPO] add reference log-prob outputs in DPO#521

Merged
Tcc0403 merged 6 commits into
linkedin:mainfrom
kashif:dpo-fix
Jan 30, 2025
Merged

[DPO] add reference log-prob outputs in DPO#521
Tcc0403 merged 6 commits into
linkedin:mainfrom
kashif:dpo-fix

Conversation

@kashif

@kashif kashif commented Jan 14, 2025

Copy link
Copy Markdown
Contributor

Summary

Since the DPO uses a reference model we also need to return the reference logprobs in DPO

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

austin362667
austin362667 previously approved these changes Jan 18, 2025

@austin362667 austin362667 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@Tcc0403 Tcc0403 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tcc0403 Tcc0403 merged commit eed8af3 into linkedin:main Jan 30, 2025
@Tcc0403 Tcc0403 mentioned this pull request Feb 1, 2025
3 tasks
austin362667 added a commit that referenced this pull request Feb 9, 2025
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
#521 introduced a unit test failure caused by the numerical issue in
bf16.

This PR relaxes the atol for `chosen_rewards` and `rejected_rewards`
only.
Note: It's just a temporary fix for ci. 

This PR also removes duplicated calculations on both terms.

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence

---------

Co-authored-by: Austin Liu <austin362667@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants