Skip to content

AlignProp Support for direct reward finetuning #7312

@parthos86

Description

@parthos86

Is your feature request related to a problem? Please describe.
No. AlignProp makes reward finetuning very fast compared to DDPO (about 25x) becoz of backpropagating the gradients directly from the reward function.

Describe the solution you'd like.
A similar integration to DDPO.
https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/ddpo.md

Describe alternatives you've considered.
There is currently an implementation, however it is not well supported and is not within diffusers pipeline.
https://github.com/mihirp1998/AlignProp/

@mihirp1998

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions