TPE performs poorly compared to the TPE in BOHB

## Environment and experiments
- python3.8
- Ubuntu18.04
- optuna2.8.0

First, I will show the comparisons of TPEs in [my repo](https://github.com/nabenabe0928/hpo_basement/tree/stable/optimizer) which is based on BOHB implementation and the TPE in Optuna v2.8.0.
| Method | 10D Rosenbrock (<img src="https://render.githubusercontent.com/render/math?math=-5\leq x_i \leq 5">) | 10D Sphere (<img src="https://render.githubusercontent.com/render/math?math=-5\leq x_i \leq 5">) | 
|--|--|--|
| Optuna TPE | <img src="https://render.githubusercontent.com/render/math?math=4697.90 \pm 1756.70"> | <img src="https://render.githubusercontent.com/render/math?math=10.84 \pm 2.69"> | 
| BOHB TPE | <img src="https://render.githubusercontent.com/render/math?math=163.46 \pm 179.60">  | <img src="https://render.githubusercontent.com/render/math?math=0.4025 \pm 0.33"> |
| Random Search | <img src="https://render.githubusercontent.com/render/math?math=16783.1095 \pm 6523.66">  | <img src="https://render.githubusercontent.com/render/math?math=27.09 \pm 8.18"> |

Each experiment is performed 10 times using different random seeds and each run uses 100 evaluations including 10 random initial evaluations.

**NOTE**
I roughly checked the performance on `ackley` as well and BOHB outperformed.
I think it is worth trying `Griewank`, `Michalewicz`, `Rastrigin`, `Schwefel`, `Xin-she yang`, `Styblinski-Tang` as well. All the functions are already available [here](https://github.com/nabenabe0928/hpo_basement/blob/stable/obj_functions/benchmarks.py) and the searching domain is defined [here](https://github.com/nabenabe0928/hpo_basement/blob/stable/params.json).

## Experiment code
**For Optuna**
```
import optuna
from optuna.samplers import TPESampler


V, dim = 5, 10
def sphere(**kwargs):
    val = 0
    for x in kwargs.values():
        val += x ** 2
    return val

def rosen(**kwargs):
    val = 0
    xs = list(kwargs.values())
    for d in range(dim - 1):
        t1 = 100 * (xs[d + 1] - xs[d] ** 2) ** 2
        t2 = (xs[d] - 1) ** 2
        val += t1 + t2
    return val

def func(trial):
    val = 0
    xs = {f'x{d}': trial.suggest_uniform(f'x{d}', -V, V) for d in range(dim)}
    return rosen(**xs)  # or sphere(**xs)

study = optuna.create_study(sampler=TPESampler(multivariate=True))
study.optimize(func, n_trials=100)
```
**For BOHB**,
```
# Repeat them 10 times 
$ python mvtpe_main.py -fuc sphere -dim 10 -eva 100 -ini 10
$ python mvtpe_main.py -fuc rosenbrock -dim 10 -eva 100 -ini 10
```

## Why?
Intrinsically, TPE is a local search method and the performance is highly sensitive to the selection of bandwidth. If I understand correctly, the Optuna implementation fixes the bandwidth factor `sigma0` over all the dimensions. However, I am not sure if this is a good strategy here because of the following two reasons:
1. Each dimension has different densities of observations (some dimensions might be packed densely while others not)
2. Low intrinsic dimensionality (It is likely to yield less packed density in unimportant dimensions)

Based on my observations, the bandwidth in the Optuna TPE is a bit large and thus the searching is quite close to random search. Although the KDE ratio will let you know which set is the best among the sampled sets, the choices of sets are combinatorial and thus usually it is hard to cover good sets with a small number of samples.

Note that since shorter bandwidth leads to more exploitative searching, it is often effective to introduce a regularizer such as mutation as in genetic algorithm.

## The lines of doubt
In this report, I only focused on multivariate TPE.

- [Bandwidth calculation part 1](https://github.com/optuna/optuna/blob/master/optuna/samplers/_tpe/parzen_estimator.py#L401-L403)
- [Bandwidth calculation part 2](https://github.com/optuna/optuna/blob/master/optuna/samplers/_tpe/parzen_estimator.py#L336-L339)
- [The corresponding part in BOHB](https://github.com/automl/HpBandSter/blob/master/hpbandster/optimizers/kde/mvkde.py#L97-L103)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TPE performs poorly compared to the TPE in BOHB #2871

Environment and experiments

Experiment code

Why?

The lines of doubt

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

TPE performs poorly compared to the TPE in BOHB #2871

Description

Environment and experiments

Experiment code

Why?

The lines of doubt

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions