Pinned
Very excited to share this work @davidrmcall did with the fantastic NVIDIA Finland team last year. We have a surprisingly simple, but sample efficient way to post-train a flow model with RL.
We developed a simple, sample-efficient online RL technique for post-training image generation models. We see it as a possible steerable alternative to CFG, driven by any scalar reward, including human preference.
00:00



