Skip to content

Parameter suggestion with PartialFixedSampler and TPE fails mid-study #6427

@ju-kreber

Description

@ju-kreber

Expected behavior

Study runs without errors.

Environment

  • Optuna version:4.7.0
  • Python version:3.12.12
  • OS:Linux-4.18.0-553.75.1.el8_10.x86_64-x86_64-with-glibc2.39

Error messages, stack traces, or logs

Eventually, after several trials have finished, get:


File /opt/conda/envs/XXX/lib/python3.12/site-packages/optuna/trial/_trial.py:163, in Trial.suggest_float(self, name, low, high, step, log)
    162 distribution = FloatDistribution(low, high, log=log, step=step)
--> 163 suggested_value = self._suggest(name, distribution)
    164 self._check_distribution(name, distribution)

File /opt/conda/envs/XXX/lib/python3.12/site-packages/optuna/trial/_trial.py:633, in Trial._suggest(self, name, distribution)
    631 elif distribution.single():
    632     param_value = distributions._get_single_value(distribution)
--> 633 elif self._is_relative_param(name, distribution):
    634     param_value = self.relative_params[name]
    635 else:

File /opt/conda/envs/XXX/lib/python3.12/site-packages/optuna/trial/_trial.py:672, in Trial._is_relative_param(self, name, distribution)
    669 assert self.relative_search_space is not None
    671 if name not in self.relative_search_space:
--> 672     raise ValueError(...)
    677 relative_distribution = self.relative_search_space[name]

ValueError: The parameter 'lr' was sampled by `sample_relative` method but it is not contained in the relative search space.

Steps to reproduce

Unfortunately, so far I could not track down what ultimately triggers this error.
The first few trials (more than n_startup_trials) run complete nicely, some are pruned.
The last trial before the error is unremarkable.
At some point, a trial will fail at the first parameter suggestion and all following trials also.

  1. Create the study with no trials, save as sqlite db.
  2. On a slurm cluster, each job accesses the db, load_study() with the same sampler and runs study.optimize()
  3. The study is loaded with a TPESampler(n_startup_trials=20, multivariate=True, group=True), which is then replaced by a PartialFixedSampler that wraps the TPESampler.
  4. The parameter in question (lr) is one of these fixed parameters. I never do fancy stuff with it, only trial.suggest_float('lr', 1e-5, 1e-2, log=True) in each trial.
  5. I do fancy stuff with other parameters (conditional on values of other parameters, changing bounds depending on other parameters).

What kinda bugs me is that when I look at study.get_trials(), the distributions contain the bounds as in the suggest, not the fixed value. Also, from the stack trace, it appears that lr is not in trial._fixed_params. I wonder if the PartialFixedSampler is working as intentional?

I would be happy for any guidance towards narrowing down this issue. Thanks!

Additional context (optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions