currently https://huggingface.co/docs/trl/main/en/clis?command_line=Reward#basic-usage shows only basic example usage for SFT, DPO and Reward. We should have it for all supported CLIs (ie, GRPO, RLOO, KTO)
currently https://huggingface.co/docs/trl/main/en/clis?command_line=Reward#basic-usage shows only basic example usage for SFT, DPO and Reward. We should have it for all supported CLIs (ie, GRPO, RLOO, KTO)