Skip to content

Fix incremental update algorithm in _CachedStorage's _read_trials_from_remote_storage#6310

Merged
c-bata merged 3 commits intooptuna:masterfrom
not522:fix-cached-storage
Oct 27, 2025
Merged

Fix incremental update algorithm in _CachedStorage's _read_trials_from_remote_storage#6310
c-bata merged 3 commits intooptuna:masterfrom
not522:fix-cached-storage

Conversation

@not522
Copy link
Copy Markdown
Member

@not522 not522 commented Oct 22, 2025

Motivation

#5704 changed the incremental update algorithm in _CachedStorage's _read_trials_from_remote_storage. It introduced last_finished_trial_id and unfinished_trial_ids, but the handling of them is buggy. When updating last_finished_trial_id, if unfinished_trial_ids isn't properly set, it may fail to retrieve all trials.

The following is a minimal reproducible example. Running Script 1 and then executing Script 2 in a separate process allows Script 2 to retrieve only one trial.

  • Script 1
import time
import optuna
from optuna.trial import TrialState

study = optuna.create_study(study_name="test", storage="sqlite:///tmp.db", load_if_exists=True)
time.sleep(5)
trial = optuna.trial.create_trial(state=TrialState.RUNNING)
study.add_trial(trial)
time.sleep(5)
for trial in study.trials:
    print(trial)
[I 2025-10-22 12:54:23,848] A new study created in RDB with name: test
FrozenTrial(number=0, state=<TrialState.RUNNING: 0>, values=None, datetime_start=datetime.datetime(2025, 10, 22, 12, 54, 28, 854576), datetime_complete=None, params={}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={}, trial_id=1, value=None)
FrozenTrial(number=1, state=<TrialState.COMPLETE: 1>, values=[0.0], datetime_start=datetime.datetime(2025, 10, 22, 12, 54, 32, 694488), datetime_complete=datetime.datetime(2025, 10, 22, 12, 54, 32, 694488), params={}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={}, trial_id=2, value=None)
  • Script 2
import time
import optuna
from optuna.trial import TrialState

study = optuna.create_study(study_name="test", storage="sqlite:///tmp.db", load_if_exists=True)
time.sleep(5)
trial = optuna.trial.create_trial(state=TrialState.COMPLETE, value=0.0)
study.add_trial(trial)
time.sleep(5)
for trial in study.trials:
    print(trial)
[I 2025-10-22 12:54:27,687] Using an existing study with name 'test' instead of creating a new one.
FrozenTrial(number=1, state=<TrialState.COMPLETE: 1>, values=[0.0], datetime_start=datetime.datetime(2025, 10, 22, 12, 54, 32, 694488), datetime_complete=datetime.datetime(2025, 10, 22, 12, 54, 32, 694488), params={}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={}, trial_id=2, value=None)

For fixing this problem, updating last_finished_trial_id should only be performed in _read_trials_from_remote_storage because they must be executed only when all trials have been considered.

Description of the changes

@not522 not522 added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Oct 22, 2025
@not522 not522 added this to the v4.6.0 milestone Oct 23, 2025
Copy link
Copy Markdown
Collaborator

@sawa3030 sawa3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed PR description. LGTM

@sawa3030 sawa3030 removed their assignment Oct 23, 2025
storage._read_trials_from_remote_storage(study_id2)

storage.delete_study(study_id1)
assert storage._get_cached_trial(trial_id1) is None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the following test case that reproduces the bug you found? I confirmed that the test case below fails on the master branch and passes with this PR.

def test_unfinished_trial_ids() -> None:
    # This test reproduces the bug fixed in https://github.com/optuna/optuna/pull/6310
    with NamedTemporaryFilePool() as tempfile:
        storage_url = f"sqlite:///{tempfile.name}"
        study_name = "test-unfinished-trial-ids"

        study1 = optuna.create_study(
            study_name=study_name,
            storage=_CachedStorage(RDBStorage(storage_url)),
            load_if_exists=True,
        )
        study2 = optuna.create_study(
            study_name=study_name,
            storage=_CachedStorage(RDBStorage(storage_url)),
            load_if_exists=True,
        )

        study1.add_trial(optuna.trial.create_trial(state=TrialState.RUNNING))
        study2.add_trial(optuna.trial.create_trial(state=TrialState.COMPLETE, value=0.0))
        assert len(study2.trials) == 2

c-bata@784fb00

Copy link
Copy Markdown
Member

@c-bata c-bata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I executed the benchmark script, which is the same one I prepared for PR #5704. I confirmed that this PR does not introduce performance degradation.

study.optimize (10 threads)

n_trials master This PR
10000 667.70 679.47

storage.get_all_trials x 20 (10 threads)

n_trials master This PR
10000 236.03 238.17

LGTM with a nit!

@not522
Copy link
Copy Markdown
Member Author

not522 commented Oct 27, 2025

Thank you for your review. I have added a test as your comment.

Copy link
Copy Markdown
Member

@c-bata c-bata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@c-bata c-bata merged commit 7908991 into optuna:master Oct 27, 2025
12 checks passed
@not522 not522 deleted the fix-cached-storage branch October 27, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants