-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
bugIssue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.needs-discussionIssue/PR which needs discussion.Issue/PR which needs discussion.
Description
Expected behavior
Some processes are killed stochastically when using JournalStorage with GrpcProxyStorage.
It seems this problem happens due to the design assumption of JournalStorage, i.e., only one thread in a process takes care of a trial from the beginning to the end.
However, GrpcProxyStorage, in principle, breaks this rule because the process for sampling and that for evaluating a trial are separated.
Note
This issue typically happens when using enqueue_trial, but I cannot deny the possibility that we face the same issue with other operations as well.
Environment
- Optuna version: 4.3
- Python version: 3.11
- OS: Ubuntu 20.04
Error messages, stack traces, or logs
[I 2025-05-20 07:15:00,216] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,222] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,227] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,231] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,242] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,245] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,246] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,254] Using an existing study with name 'b99b661c-4229-4a54-92bf-f29b0ed7db25' instead of creating a new one.
[I 2025-05-20 07:15:00,294] Trial 0 finished with value: 114.0 and parameters: {'y': 3.8116090154495232, 'x': 0.08687552782907204}. Best is trial 0 with value: 114.0.
[I 2025-05-20 07:15:00,298] Trial 1 finished with value: 205.0 and parameters: {'y': -1.2209212453908358, 'x': -2.021884130213263}. Best is trial 0 with value: 114.0.
[I 2025-05-20 07:15:00,308] Trial 2 finished with value: 311.0 and parameters: {'y': 0.7764554661270058, 'x': 3.346412592844274}. Best is trial 0 with value: 114.0.
[I 2025-05-20 07:15:00,314] Trial 2 finished with value: 311.0 and parameters: {'y': 0.7764554661270058, 'x': 3.346412592844274}. Best is trial 0 with value: 114.0.
Process Process-7:
[I 2025-05-20 07:15:00,315] Trial 3 finished with value: 534.0 and parameters: {'y': -3.0491387551257243, 'x': 4.990802171581327}. Best is trial 0 with value: 114.0.
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/shuhei/pfn-work/optuna-dev/optuna/simple_grpc.py", line 27, in main
study.optimize(objective, n_trials=1)
File "/home/shuhei/pfn-work/optuna-dev/optuna/optuna/study/study.py", line 475, in optimize
_optimize(
File "/home/shuhei/pfn-work/optuna-dev/optuna/optuna/study/_optimize.py", line 63, in _optimize
_optimize_sequential(
File "/home/shuhei/pfn-work/optuna-dev/optuna/optuna/study/_optimize.py", line 160, in _optimize_sequential
frozen_trial = _run_trial(study, func, catch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/shuhei/pfn-work/optuna-dev/optuna/optuna/study/_optimize.py", line 209, in _run_trial
frozen_trial = _tell_with_warning(
^^^^^^^^^^^^^^^^^^^
File "/home/shuhei/pfn-work/optuna-dev/optuna/optuna/study/_tell.py", line 120, in _tell_with_warning
raise ValueError(f"Cannot tell a {frozen_trial.state.name} trial.")
ValueError: Cannot tell a COMPLETE trial.
[I 2025-05-20 07:15:00,320] Trial 4 finished with value: 426.0 and parameters: {'y': -4.9408864997561945, 'x': 1.5103378665562222}. Best is trial 0 with value: 114.0.
[I 2025-05-20 07:15:00,322] Trial 5 finished with value: 629.0 and parameters: {'y': -4.889657514236255, 'x': -2.4055138468929904}. Best is trial 0 with value: 114.0.
[I 2025-05-20 07:15:00,324] Trial 6 finished with value: 826.0 and parameters: {'y': 4.624483520664185, 'x': 2.162965026496132}. Best is trial 0 with value: 114.0.Steps to reproduce
Install the dependencies:
$ pip install optuna grpcio protobufThen build a proxy server with the following code:
import os
from optuna.storages import run_grpc_proxy_server
from optuna.storages.journal import JournalFileBackend
from optuna.storages.journal import JournalStorage
try:
os.remove("test-grpc.log")
except FileNotFoundError:
pass
storage = JournalStorage(JournalFileBackend("test-grpc.log"))
run_grpc_proxy_server(storage, host="localhost", port=13000)Launch another process and run the following code:
from collections.abc import Callable
import multiprocessing
import os
import time
import uuid
import numpy as np
import optuna
def load_study(study_name: str, storage_builder: Callable[[], optuna.storages.BaseStorage]) -> optuna.Study:
sampler = optuna.samplers.RandomSampler()
return optuna.create_study(
study_name=study_name, sampler=sampler, storage=storage_builder(), load_if_exists=True
)
def main(study_name: str, worker_id: int, storage_builder: Callable[[], optuna.storages.BaseStorage]) -> None:
def objective(trial: optuna.Trial) -> float:
time.sleep(0.01)
x = trial.suggest_float("x", -5, 5)
y = trial.suggest_float("y", -5, 5)
return float(int((worker_id + 1) * 100 + x**2 + y**2))
study = load_study(study_name, storage_builder)
study.optimize(objective, n_trials=1)
def enqueue(study_name: str, storage_builder: Callable[[], optuna.storages.BaseStorage]) -> None:
study = load_study(study_name, storage_builder)
XY = np.random.random((50, 2)) * 10 - 5
for xy in XY:
study.enqueue_trial({"x": float(xy[0]), "y": float(xy[1])})
def execute(storage_builder: Callable[[], optuna.storages.BaseStorage]) -> None:
study_name = str(uuid.uuid4())
enqueue(study_name, storage_builder)
procs = []
for i in range(8):
proc = multiprocessing.Process(target=main, args=(study_name, i, storage_builder))
procs.append(proc)
proc.start()
for proc in procs:
proc.join()
if __name__ == "__main__":
execute(lambda: optuna.storages.GrpcStorageProxy(host="localhost", port=13000))Additional context (optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugIssue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.needs-discussionIssue/PR which needs discussion.Issue/PR which needs discussion.
