Sync owned trials when calling study.ask and study.get_trials#4431
Sync owned trials when calling study.ask and study.get_trials#4431c-bata wants to merge 6 commits intooptuna:masterfrom
study.ask and study.get_trials#4431Conversation
d6af03f to
94dcd58
Compare
|
|
||
| # Should fail because the trial0 is already finished. | ||
| with pytest.raises(RuntimeError): | ||
| with pytest.raises(ValueError): |
There was a problem hiding this comment.
FYI: Previously, check_trial_is_updatable() method in optuna/storages/_rdb/storage.py raises RuntimeError. However, from this change, optuna/study/_tell.py" raises ValueError like the following CI output.
https://github.com/optuna/optuna/actions/runs/4200669694/jobs/7286925478
|
@amylase @contramundum53 Could you review this PR? |
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #4431 +/- ##
==========================================
+ Coverage 90.37% 90.74% +0.37%
==========================================
Files 173 185 +12
Lines 13583 15037 +1454
==========================================
+ Hits 12275 13645 +1370
- Misses 1308 1392 +84
... and 35 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
amylase
left a comment
There was a problem hiding this comment.
Also, could you update the section of _CachedStorage's docstring that mentions read_trials_from_remote_storage?
optuna/storages/_cached_storage.py
Outdated
| return copy.deepcopy(trials) if deepcopy else trials | ||
|
|
||
| def read_trials_from_remote_storage(self, study_id: int) -> None: | ||
| def read_trials_from_remote_storage(self, study_id: int, sync_owned_trials: bool) -> None: |
There was a problem hiding this comment.
Did you consider to implement two separate methods instead of having a boolean parameter?
There was a problem hiding this comment.
Would you mind motivating this change a bit?
optuna/storages/_cached_storage.py
Outdated
| ) -> List[FrozenTrial]: | ||
| if study_id not in self._studies: | ||
| self.read_trials_from_remote_storage(study_id) | ||
| self.read_trials_from_remote_storage(study_id, sync_owned_trials=False) |
There was a problem hiding this comment.
Why can we skip syncing owned trials here? If external processes can access & update them, I think we should always sync them.
There was a problem hiding this comment.
Because it may cause a performance degradation for users who use study.optimize() api. However, on second thought, Optuna samplers call study.get_trials() so there seems to be a performance degradation either way. It may be better to remove the sync_owned_trials option and always sync them.
|
This pull request has not seen any recent activity. |
|
This pull request was closed automatically because it had not seen any recent activity. If you want to discuss it, you can reopen it freely. |
| with self._lock: | ||
| cached_trial = self._get_cached_trial(trial_id) | ||
| if cached_trial is not None: | ||
| self._check_trial_is_updatable(cached_trial) | ||
|
|
||
| study_id, _ = self._trial_id_to_study_id_and_number[trial_id] | ||
| cached_dist = self._studies[study_id].param_distribution.get(param_name, None) | ||
| if cached_dist: | ||
| distributions.check_distribution_compatibility(cached_dist, distribution) | ||
| else: | ||
| # On cache miss, check compatibility against previous trials in the database | ||
| # and INSERT immediately to prevent other processes from creating incompatible | ||
| # ones. By INSERT, it is assumed that no previous entry has been persisted | ||
| # already. | ||
| self._backend._check_and_set_param_distribution( | ||
| study_id, trial_id, param_name, param_value_internal, distribution | ||
| ) | ||
| self._studies[study_id].param_distribution[param_name] = distribution | ||
|
|
||
| params = copy.copy(cached_trial.params) | ||
| params[param_name] = distribution.to_external_repr(param_value_internal) | ||
| cached_trial.params = params | ||
|
|
||
| dists = copy.copy(cached_trial.distributions) | ||
| dists[param_name] = distribution | ||
| cached_trial.distributions = dists | ||
|
|
||
| if cached_dist: # Already persisted in case of cache miss so no need to update. | ||
| self._backend.set_trial_param( | ||
| trial_id, param_name, param_value_internal, distribution | ||
| ) | ||
| return |
There was a problem hiding this comment.
This logic updates the cached owned trials. However, from this PR, owned trials are always synced, so this logic is no longer needed.
| trial_id = storage.create_new_trial(study_id) | ||
| with patch.object( | ||
| base_storage, "_check_and_set_param_distribution", return_value=True | ||
| ) as set_mock: | ||
| storage.set_trial_param( | ||
| trial_id, "paramA", 1.2, optuna.distributions.FloatDistribution(-0.2, 2.3) | ||
| ) | ||
| assert set_mock.call_count == 1 |
There was a problem hiding this comment.
This test case checks that the owned trial cache is updated. However, as explained in https://github.com/optuna/optuna/pull/4431/files#r1147191207, the owned trials always synced.
|
FYI: I ran the following benchmark to compare how CachedStorage slows down with this change. https://gist.github.com/c-bata/16d654471393f59e064482ffb8b52bf8
|
|
Let me unassign reviewers since the performance become slower than I expected. I will mark this ready for review after fixing the performance issue. |
|
This pull request has not seen any recent activity. |
Motivation
read_trials_from_remote_storagemethod in_CachedStoragedoes not sync trials owned by itself. This behavior causes problems when another worker (e.g. Optuna Dashboard) make the trial completed as follows:https://gist.github.com/c-bata/69e81a34c395d3ea368e24b0d5e0d367
Description of the changes
This PR changes
excluded_trial_idsfromowned_or_finished_trial_idstofinished_trial_idsonly.