Skip to content

Sync owned trials when calling study.ask and study.get_trials#4431

Closed
c-bata wants to merge 6 commits intooptuna:masterfrom
c-bata:cached-storage-sync-owned-trials
Closed

Sync owned trials when calling study.ask and study.get_trials#4431
c-bata wants to merge 6 commits intooptuna:masterfrom
c-bata:cached-storage-sync-owned-trials

Conversation

@c-bata
Copy link
Copy Markdown
Member

@c-bata c-bata commented Feb 17, 2023

Motivation

read_trials_from_remote_storage method in _CachedStorage does not sync trials owned by itself. This behavior causes problems when another worker (e.g. Optuna Dashboard) make the trial completed as follows:
https://gist.github.com/c-bata/69e81a34c395d3ea368e24b0d5e0d367

Description of the changes

This PR changes excluded_trial_ids from owned_or_finished_trial_ids to finished_trial_ids only.

@github-actions github-actions bot added the optuna.storages Related to the `optuna.storages` submodule. This is automatically labeled by github-actions. label Feb 17, 2023
@c-bata c-bata force-pushed the cached-storage-sync-owned-trials branch from d6af03f to 94dcd58 Compare February 17, 2023 04:23
@c-bata c-bata added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label Feb 17, 2023

# Should fail because the trial0 is already finished.
with pytest.raises(RuntimeError):
with pytest.raises(ValueError):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Previously, check_trial_is_updatable() method in optuna/storages/_rdb/storage.py raises RuntimeError. However, from this change, optuna/study/_tell.py" raises ValueError like the following CI output.
https://github.com/optuna/optuna/actions/runs/4200669694/jobs/7286925478

@c-bata c-bata marked this pull request as ready for review February 17, 2023 05:55
@c-bata
Copy link
Copy Markdown
Member Author

c-bata commented Feb 17, 2023

@amylase @contramundum53 Could you review this PR?

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 17, 2023

Codecov Report

Merging #4431 (b7dcb02) into master (cc7f873) will increase coverage by 0.37%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #4431      +/-   ##
==========================================
+ Coverage   90.37%   90.74%   +0.37%     
==========================================
  Files         173      185      +12     
  Lines       13583    15037    +1454     
==========================================
+ Hits        12275    13645    +1370     
- Misses       1308     1392      +84     
Impacted Files Coverage Δ
optuna/storages/_cached_storage.py 98.75% <100.00%> (+0.03%) ⬆️
optuna/study/study.py 95.46% <100.00%> (+0.50%) ⬆️

... and 35 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Copy Markdown
Collaborator

@amylase amylase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you update the section of _CachedStorage's docstring that mentions read_trials_from_remote_storage?

https://github.com/optuna/optuna/pull/4431/files#diff-8ce5c3176c6b5fa3a21ed11da3f19d07c7c0291b7b7b6b67221b057855b7ac50R55

return copy.deepcopy(trials) if deepcopy else trials

def read_trials_from_remote_storage(self, study_id: int) -> None:
def read_trials_from_remote_storage(self, study_id: int, sync_owned_trials: bool) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider to implement two separate methods instead of having a boolean parameter?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind motivating this change a bit?

) -> List[FrozenTrial]:
if study_id not in self._studies:
self.read_trials_from_remote_storage(study_id)
self.read_trials_from_remote_storage(study_id, sync_owned_trials=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we skip syncing owned trials here? If external processes can access & update them, I think we should always sync them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it may cause a performance degradation for users who use study.optimize() api. However, on second thought, Optuna samplers call study.get_trials() so there seems to be a performance degradation either way. It may be better to remove the sync_owned_trials option and always sync them.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2023

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Mar 6, 2023
@github-actions
Copy link
Copy Markdown
Contributor

This pull request was closed automatically because it had not seen any recent activity. If you want to discuss it, you can reopen it freely.

@github-actions github-actions bot closed this Mar 21, 2023
@c-bata c-bata reopened this Mar 22, 2023
@c-bata c-bata removed the stale Exempt from stale bot labeling. label Mar 22, 2023
@c-bata c-bata marked this pull request as draft March 23, 2023 05:44
Comment on lines -197 to -228
with self._lock:
cached_trial = self._get_cached_trial(trial_id)
if cached_trial is not None:
self._check_trial_is_updatable(cached_trial)

study_id, _ = self._trial_id_to_study_id_and_number[trial_id]
cached_dist = self._studies[study_id].param_distribution.get(param_name, None)
if cached_dist:
distributions.check_distribution_compatibility(cached_dist, distribution)
else:
# On cache miss, check compatibility against previous trials in the database
# and INSERT immediately to prevent other processes from creating incompatible
# ones. By INSERT, it is assumed that no previous entry has been persisted
# already.
self._backend._check_and_set_param_distribution(
study_id, trial_id, param_name, param_value_internal, distribution
)
self._studies[study_id].param_distribution[param_name] = distribution

params = copy.copy(cached_trial.params)
params[param_name] = distribution.to_external_repr(param_value_internal)
cached_trial.params = params

dists = copy.copy(cached_trial.distributions)
dists[param_name] = distribution
cached_trial.distributions = dists

if cached_dist: # Already persisted in case of cache miss so no need to update.
self._backend.set_trial_param(
trial_id, param_name, param_value_internal, distribution
)
return
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic updates the cached owned trials. However, from this PR, owned trials are always synced, so this logic is no longer needed.

Comment on lines -70 to -77
trial_id = storage.create_new_trial(study_id)
with patch.object(
base_storage, "_check_and_set_param_distribution", return_value=True
) as set_mock:
storage.set_trial_param(
trial_id, "paramA", 1.2, optuna.distributions.FloatDistribution(-0.2, 2.3)
)
assert set_mock.call_count == 1
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case checks that the owned trial cache is updated. However, as explained in https://github.com/optuna/optuna/pull/4431/files#r1147191207, the owned trials always synced.

@c-bata
Copy link
Copy Markdown
Member Author

c-bata commented Mar 24, 2023

FYI: I ran the following benchmark to compare how CachedStorage slows down with this change.

https://gist.github.com/c-bata/16d654471393f59e064482ffb8b52bf8

master This PR
RDBStorage 47.9438s 51.0639s
CachedStorage(RDBStorage) 36.6144s 54.9168s

@c-bata
Copy link
Copy Markdown
Member Author

c-bata commented Mar 30, 2023

Let me unassign reviewers since the performance become slower than I expected. I will mark this ready for review after fixing the performance issue.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Apr 11, 2023
@c-bata c-bata closed this Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. optuna.storages Related to the `optuna.storages` submodule. This is automatically labeled by github-actions. stale Exempt from stale bot labeling.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants