Skip to content

Fix _get_observation_pairs for conditional parameters.#1166

Merged
ytsmiling merged 7 commits intooptuna:masterfrom
y0z:conditional-ei
May 14, 2020
Merged

Fix _get_observation_pairs for conditional parameters.#1166
ytsmiling merged 7 commits intooptuna:masterfrom
y0z:conditional-ei

Conversation

@y0z
Copy link
Copy Markdown
Member

@y0z y0z commented Apr 25, 2020

This PR fixes split_observations for conditional parameters to calculate EI values.

Motivation

The current implementation of the split_observations in the TPESampler is as follows:

  1. The sampler discards the trials that do not contain the parameter named param_name when it gets observation pairs. This causes different len(config_vals) for conditional parameters.
    if param_name not in trial.params:
  2. Then, the sampler calculates the y* value based on the gamma = min(int(np.ceil(0.1 * len(config_vals))), 25)
    n_below = self._gamma(len(config_vals))
  3. Finally, the sampler splits config_vals into the ones with loss_vals < y* (i.e., the observations for l(x)) and the remaining ones (i.e., the observations for g(x)).

Based on my understanding, y* which determines the integration interval for the EI calculation should be the same value for all parameters (ref. the TPE paper).
However, the current strategy may set different y* values for conditional parameters.
Here is an example of this situation (note: this example is maximization).
In this example, y* for params_classifier is 0.9466666666666667, whereas y* for params_svc_c 0.32.
This seems to cause a curious EI calculation, i.e., a different integration interval is set for each parameter.

Description of the changes

This PR changes the split_observation procedure as follows:

  1. The sampler does not discard the trials that do not contain the parameter named param_name when it gets observation.
  2. Then, the sampler calculates the y* value based on the gamma = min(int(np.ceil(0.1 * len(config_vals))), 25). This y* is consistent among all hyperparameters.
  3. After that, the sampler splits config_vals into the observations for l(x) and the observations for g(x).
  4. Finally, the sampler discards the config_vals with the observations that do not contain the parameter named param_name from the observations for l(x) and g(x).

I have also confirmed that the split_observation procedure in the original TPE implementation in HyperOpt is like this.
https://github.com/hyperopt/hyperopt/blob/master/hyperopt/tpe.py#L625
In this implementation, o_vals is the set of config_vals that contain the parameter named param_name only, whereas l_vals is the set of all of the observed losses so far.

My apologies if I am misunderstanding the algorithm and please close this PR.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Apr 25, 2020

Codecov Report

Merging #1166 into master will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1166      +/-   ##
==========================================
+ Coverage   90.99%   91.02%   +0.03%     
==========================================
  Files         142      142              
  Lines       12179    12280     +101     
==========================================
+ Hits        11082    11178      +96     
- Misses       1097     1102       +5     
Impacted Files Coverage Δ
optuna/samplers/tpe/parzen_estimator.py 96.96% <100.00%> (+0.09%) ⬆️
optuna/samplers/tpe/sampler.py 88.40% <100.00%> (+0.23%) ⬆️
.../samplers_tests/tpe_tests/test_parzen_estimator.py 90.90% <100.00%> (ø)
tests/samplers_tests/tpe_tests/test_sampler.py 99.57% <100.00%> (+0.01%) ⬆️
optuna/pruners/hyperband.py 86.95% <0.00%> (-5.24%) ⬇️
optuna/samplers/random.py 84.44% <0.00%> (-1.56%) ⬇️
tests/samplers_tests/test_cmaes.py 100.00% <0.00%> (ø)
.../integration_tests/allennlp_tests/test_allennlp.py 100.00% <0.00%> (ø)
tests/test_study.py 98.26% <0.00%> (+0.02%) ⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f614d69...f3ebf31. Read the comment docs.

@crcrpar crcrpar added the optuna.samplers Related to the `optuna.samplers` submodule. This is automatically labeled by github-actions. label Apr 25, 2020
Copy link
Copy Markdown
Member

@ytsmiling ytsmiling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sincere thanks for your contribution.

Comment on lines +143 to +147
if not self._parzen_estimator_parameters.consider_prior and (n_below == 0 or n_above == 0):
return self._random_sampler.sample_independent(
study, trial, param_name, param_distribution
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fall-back consistent with the hyperopt implementation? cc: @HideakiImamura

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked through the HyperOpt code.
It seems that HyperOpt does not have an option to disable to use a prior.
Therefore, HyperOpt has no fall-back for a situation like this.

The current fall-back is the same for the case: n < self._n_startup_trials.
https://github.com/optuna/optuna/blob/master/optuna/samplers/tpe/sampler.py#L139

Another idea is simply ignoring self._parzen_estimator_parameters.consider_prior flag and enabling a prior in this situation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @y0z. The hyperopt does not have any option to disable a prior.
I think ignoring consider_prior flag is good idea in this situation, since it is consistent with normal situation.

@ytsmiling
Copy link
Copy Markdown
Member

When there are multiple workers, a similar problem will remain even after this PR is merged. Another PR should address this issue. Related issue: #1170.

…flag when the number of observations is zero.
@y0z
Copy link
Copy Markdown
Member Author

y0z commented Apr 28, 2020

Thank you for your feedback.
I have changed the fall-back procedure.
Now, when the number of observations is zero, the Parzen estimator ignores the consider_prior flag and utilizes a prior.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label May 12, 2020
HideakiImamura
HideakiImamura previously approved these changes May 13, 2020
Copy link
Copy Markdown
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing your update. LGTM!

@HideakiImamura HideakiImamura removed the stale Exempt from stale bot labeling. label May 13, 2020
@HideakiImamura HideakiImamura dismissed their stale review May 13, 2020 07:47

Sorry for confusion. I think we need some benchmark experiments to quantify the proposed improvement.

@HideakiImamura
Copy link
Copy Markdown
Member

@y0z I think we need some benchmark comparison between existing TPE and proposed one to quantify the improvement, but it may take quite a lot of time. I think it's good to leave a comment in the code without doing any benchmark experiments like this.

Note: When the number of observations is zero, the Parzen estimator ignores the consider_prior flag and utilizes a prior. Validation of this approach is future work.

Copy link
Copy Markdown
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your quick response! LGTM!

Copy link
Copy Markdown
Member

@ytsmiling ytsmiling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR (and sorry for the late decision).
LGTM!

@ytsmiling ytsmiling merged commit 8e813c9 into optuna:master May 14, 2020
@hvy hvy added this to the v1.5.0 milestone May 14, 2020
@hvy hvy added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label May 14, 2020
@hvy hvy changed the title Fix split_observations for conditional parameters Fix _get_observation_pairs for conditional parameters. May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. optuna.samplers Related to the `optuna.samplers` submodule. This is automatically labeled by github-actions.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants