Pinned
(1/9) Most LM fine-tuning optimizes next-token loss or scalar rewards.
What if we fine-tune language models so that feature statistics of partial rollouts match those of ground-truth completions?
That leads to Energy-Based Fine-Tuning (EBFT).
arXiv: arxiv.org/abs/2603.12248
00:00

