-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
bugIssue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.
Description
Expected behavior
Study runs without errors.
Environment
- Optuna version:4.7.0
- Python version:3.12.12
- OS:Linux-4.18.0-553.75.1.el8_10.x86_64-x86_64-with-glibc2.39
Error messages, stack traces, or logs
Eventually, after several trials have finished, get:
File /opt/conda/envs/XXX/lib/python3.12/site-packages/optuna/trial/_trial.py:163, in Trial.suggest_float(self, name, low, high, step, log)
162 distribution = FloatDistribution(low, high, log=log, step=step)
--> 163 suggested_value = self._suggest(name, distribution)
164 self._check_distribution(name, distribution)
File /opt/conda/envs/XXX/lib/python3.12/site-packages/optuna/trial/_trial.py:633, in Trial._suggest(self, name, distribution)
631 elif distribution.single():
632 param_value = distributions._get_single_value(distribution)
--> 633 elif self._is_relative_param(name, distribution):
634 param_value = self.relative_params[name]
635 else:
File /opt/conda/envs/XXX/lib/python3.12/site-packages/optuna/trial/_trial.py:672, in Trial._is_relative_param(self, name, distribution)
669 assert self.relative_search_space is not None
671 if name not in self.relative_search_space:
--> 672 raise ValueError(...)
677 relative_distribution = self.relative_search_space[name]
ValueError: The parameter 'lr' was sampled by `sample_relative` method but it is not contained in the relative search space.Steps to reproduce
Unfortunately, so far I could not track down what ultimately triggers this error.
The first few trials (more than n_startup_trials) run complete nicely, some are pruned.
The last trial before the error is unremarkable.
At some point, a trial will fail at the first parameter suggestion and all following trials also.
- Create the study with no trials, save as sqlite db.
- On a slurm cluster, each job accesses the db,
load_study()with the same sampler and runsstudy.optimize() - The study is loaded with a
TPESampler(n_startup_trials=20, multivariate=True, group=True), which is then replaced by aPartialFixedSamplerthat wraps theTPESampler. - The parameter in question (
lr) is one of these fixed parameters. I never do fancy stuff with it, onlytrial.suggest_float('lr', 1e-5, 1e-2, log=True)in each trial. - I do fancy stuff with other parameters (conditional on values of other parameters, changing bounds depending on other parameters).
What kinda bugs me is that when I look at study.get_trials(), the distributions contain the bounds as in the suggest, not the fixed value. Also, from the stack trace, it appears that lr is not in trial._fixed_params. I wonder if the PartialFixedSampler is working as intentional?
I would be happy for any guidance towards narrowing down this issue. Thanks!
Additional context (optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugIssue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.