When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.
|
def process(row): |
|
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False) |
|
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False) |
And it seems that
chosen is a complete conversation.
[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]
I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.
When I looked at the examples I found that the example script for DPO uses
apply_chat_templateforchosenandrejectedbut not forprompt.trl/examples/scripts/dpo.py
Lines 150 to 152 in d1ed730
And it seems that
chosenis a complete conversation.I think that using chat_template for the input prompt and only remaining the
assistantoutput aschosen/rejectedwill be consistent with the inference phase.