Is your feature request related to a problem? Please describe.
No. AlignProp makes reward finetuning very fast compared to DDPO (about 25x) becoz of backpropagating the gradients directly from the reward function.
Describe the solution you'd like.
A similar integration to DDPO.
https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/ddpo.md
Describe alternatives you've considered.
There is currently an implementation, however it is not well supported and is not within diffusers pipeline.
https://github.com/mihirp1998/AlignProp/
@mihirp1998
Is your feature request related to a problem? Please describe.
No. AlignProp makes reward finetuning very fast compared to DDPO (about 25x) becoz of backpropagating the gradients directly from the reward function.
Describe the solution you'd like.
A similar integration to DDPO.
https://github.com/huggingface/diffusers/blob/main/docs/source/en/training/ddpo.md
Describe alternatives you've considered.
There is currently an implementation, however it is not well supported and is not within diffusers pipeline.
https://github.com/mihirp1998/AlignProp/
@mihirp1998