Skip to content

Question about apply_chat_template in examples #1752

@egangu

Description

@egangu

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.

[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]

I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions