Refactor KTO [2/N]: Improve config validation in KTOConfig#4787
Refactor KTO [2/N]: Improve config validation in KTOConfig#4787albertvillanova wants to merge 2 commits into
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| if args.generate_during_eval and not (is_wandb_available() or is_comet_available()): | ||
| raise ValueError( | ||
| "`generate_during_eval=True` requires Weights and Biases or Comet to be installed." | ||
| " Please install `wandb` or `comet-ml` to resolve." | ||
| ) | ||
|
|
There was a problem hiding this comment.
What do you think about this change? Before it was checked at trainer instantiation and now at config creation. This has an import time side effect: it imports wandb/comet checkers immediately.
There was a problem hiding this comment.
In general, I prefer to keep all these kinds of checks in one place (in MyTrainer.__init__), so that the configuration remains minimal, and we avoid unintended duplication. That said, the codebase isn’t entirely consistent on this point, see for example:
trl/trl/trainer/grpo_config.py
Lines 845 to 888 in 1a93971
For this specific argument, I think it could probably be removed. See point 7 here:
#3906 (comment)
| # Validate beta | ||
| if self.beta <= 0: | ||
| raise ValueError( | ||
| f"beta must be positive, got {self.beta}. Higher β means less deviation from the reference model." | ||
| ) | ||
|
|
||
| # Validate weights | ||
| if self.desirable_weight <= 0: | ||
| raise ValueError( | ||
| f"desirable_weight must be positive, got {self.desirable_weight}. " | ||
| "This weight is used to balance desirable and undesirable examples." | ||
| ) | ||
|
|
||
| if self.undesirable_weight <= 0: | ||
| raise ValueError( | ||
| f"undesirable_weight must be positive, got {self.undesirable_weight}. " | ||
| "This weight is used to balance desirable and undesirable examples." | ||
| ) |
There was a problem hiding this comment.
I’m generally in favor of keeping validation as minimal as possible. Users experimenting with parameters are expected to understand the semantics, and overly defensive validation tends to grow without bound. I’d rather keep checks limited to hard invariants (i.e. impossible combinations), like here:
trl/trl/trainer/grpo_config.py
Lines 876 to 882 in 1a93971
This one is required by the sampling logic: if you remove the check, you can end up in a failure mode that would be very hard to debug. Beyond that, I’d rather let mistakes fail naturally.
Also, strict value checks can unnecessarily block experimentation. For example, negative (un)desirable weights aren’t mathematically invalid: they change the objective (and are almost certainly not what we want), but I’m not sure we should hard-forbid them at the config level unless they actually cause breakage.
Refactor KTO [2/N]: Improve config validation in KTOConfig.
This PR moves validation logic from
KTOTrainer.__init__()toKTOConfig.__post_init__()for earlier error detection, better user experience, and cleaner separation of concerns.Part of:
Problem
Before:
After:
KTOConfig.__post_init__()