Fix _get_observation_pairs for conditional parameters.#1166
Fix _get_observation_pairs for conditional parameters.#1166ytsmiling merged 7 commits intooptuna:masterfrom y0z:conditional-ei
_get_observation_pairs for conditional parameters.#1166Conversation
Codecov Report
@@ Coverage Diff @@
## master #1166 +/- ##
==========================================
+ Coverage 90.99% 91.02% +0.03%
==========================================
Files 142 142
Lines 12179 12280 +101
==========================================
+ Hits 11082 11178 +96
- Misses 1097 1102 +5
Continue to review full report at Codecov.
|
ytsmiling
left a comment
There was a problem hiding this comment.
Sincere thanks for your contribution.
optuna/samplers/tpe/sampler.py
Outdated
| if not self._parzen_estimator_parameters.consider_prior and (n_below == 0 or n_above == 0): | ||
| return self._random_sampler.sample_independent( | ||
| study, trial, param_name, param_distribution | ||
| ) | ||
|
|
There was a problem hiding this comment.
Is this fall-back consistent with the hyperopt implementation? cc: @HideakiImamura
There was a problem hiding this comment.
I have looked through the HyperOpt code.
It seems that HyperOpt does not have an option to disable to use a prior.
Therefore, HyperOpt has no fall-back for a situation like this.
The current fall-back is the same for the case: n < self._n_startup_trials.
https://github.com/optuna/optuna/blob/master/optuna/samplers/tpe/sampler.py#L139
Another idea is simply ignoring self._parzen_estimator_parameters.consider_prior flag and enabling a prior in this situation.
There was a problem hiding this comment.
I agree with @y0z. The hyperopt does not have any option to disable a prior.
I think ignoring consider_prior flag is good idea in this situation, since it is consistent with normal situation.
|
When there are multiple workers, a similar problem will remain even after this PR is merged. Another PR should address this issue. Related issue: #1170. |
…flag when the number of observations is zero.
|
Thank you for your feedback. |
|
This pull request has not seen any recent activity. |
HideakiImamura
left a comment
There was a problem hiding this comment.
Sorry for the delay in reviewing your update. LGTM!
Sorry for confusion. I think we need some benchmark experiments to quantify the proposed improvement.
|
@y0z I think we need some benchmark comparison between existing TPE and proposed one to quantify the improvement, but it may take quite a lot of time. I think it's good to leave a comment in the code without doing any benchmark experiments like this. |
HideakiImamura
left a comment
There was a problem hiding this comment.
Thank you for your quick response! LGTM!
ytsmiling
left a comment
There was a problem hiding this comment.
Thank you for your PR (and sorry for the late decision).
LGTM!
_get_observation_pairs for conditional parameters.
This PR fixes split_observations for conditional parameters to calculate EI values.
Motivation
The current implementation of the split_observations in the TPESampler is as follows:
param_namewhen it gets observation pairs. This causes different len(config_vals) for conditional parameters.optuna/optuna/samplers/tpe/sampler.py
Line 562 in f614d69
optuna/optuna/samplers/tpe/sampler.py
Line 183 in f614d69
Based on my understanding, y* which determines the integration interval for the EI calculation should be the same value for all parameters (ref. the TPE paper).
However, the current strategy may set different y* values for conditional parameters.
Here is an example of this situation (note: this example is maximization).
In this example, y* for params_classifier is 0.9466666666666667, whereas y* for params_svc_c 0.32.
This seems to cause a curious EI calculation, i.e., a different integration interval is set for each parameter.
Description of the changes
This PR changes the split_observation procedure as follows:
param_namewhen it gets observation.param_namefrom the observations for l(x) and g(x).I have also confirmed that the split_observation procedure in the original TPE implementation in HyperOpt is like this.
https://github.com/hyperopt/hyperopt/blob/master/hyperopt/tpe.py#L625
In this implementation,
o_valsis the set of config_vals that contain the parameter namedparam_nameonly, whereasl_valsis the set of all of the observed losses so far.My apologies if I am misunderstanding the algorithm and please close this PR.