AlignProp Support for direct reward finetuning

**Is your feature request related to a problem? Please describe.**
No. AlignProp makes reward finetuning very fast compared to DDPO (about 25x) becoz of backpropagating the gradients directly from the reward function.

**Describe the solution you'd like.**
A similar integration to DDPO. 
https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/ddpo.md


**Describe alternatives you've considered.**
There is currently an implementation, however it is not well supported and is not within diffusers pipeline.
https://github.com/mihirp1998/AlignProp/

@mihirp1998



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlignProp Support for direct reward finetuning #7312

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AlignProp Support for direct reward finetuning #7312

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions