Sync owned trials when calling study.ask and study.get_trials#4631
Sync owned trials when calling study.ask and study.get_trials#4631c-bata merged 15 commits intooptuna:masterfrom
study.ask and study.get_trials#4631Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #4631 +/- ##
==========================================
+ Coverage 90.89% 90.93% +0.03%
==========================================
Files 184 184
Lines 13959 13910 -49
==========================================
- Hits 12688 12649 -39
+ Misses 1271 1261 -10
... and 2 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
This PR slows down the optimization by about 15 seconds for 1,000 trials with
|
|
This pull request has not seen any recent activity. |
Co-authored-by: Gen <54583542+gen740@users.noreply.github.com>
gen740
left a comment
There was a problem hiding this comment.
Almost LGTM! I left a minor comment.
tests/trial_tests/test_trial.py
Outdated
| # _LazyTrialSystemAttrs gets attrs the first time it is needed. | ||
| # Then, we create the instance for each method, and test the first and second use. | ||
|
|
||
| system_attrs = optuna.trial._trial._LazyTrialSystemAttrs(trial._trial_id, storage) |
There was a problem hiding this comment.
My pyright language server raised the issue that "_trial" is not a known member of module "optuna.trial". This is because pyright recognizes _trial as a private module, and it could not resolve it. Importing directly (from optuna.trial._trial import _LazyTrialSystemAttrs) could resolve this issue.
Although this is not a fatal issue, do you think it should be fixed?
| self._add_trials_to_cache(study_id, [self._backend.get_trial(trial_id)]) | ||
| self._studies[study_id].owned_or_finished_trial_ids.add(trial_id) | ||
| return ret | ||
| return self._backend.set_trial_state_values(trial_id, state=state, values=values) |
There was a problem hiding this comment.
Just for the record, I compared the speed when applying the following change, but found no performance improvement. So the current logic seems to be reasonable.
Changes:
ret = self._backend.set_trial_state_values(trial_id, state=state, values=values)
if state.is_finished() and trial_id in self._trial_id_to_study_id_and_number:
backend_trial = self._backend.get_trial(trial_id)
study_id, trial_number = self._trial_id_to_study_id_and_number[trial_id]
with self._lock:
study = self._studies[study_id]
study.trials[trial_number] = backend_trial
study.finished_trial_ids.add(trial_id)
return retBenchmark script:
https://gist.github.com/c-bata/fa8efc075e86a1f7e6dd6d83b85c1017
Result:
| Branch | n_params=5 (mean ± std) (s) |
n_params=30 (mean ± std) (s) |
|---|---|---|
| PR 4631 | 10.1026 ± 0.1046 | 35.8279 ± 0.1306 |
| cache-set-trial-state-values | 9.6567 ± 0.1974 | 35.8301 ± 0.1761 |
| main | 7.6167 ± 0.0959 | 32.8131 ± 0.1010 |
| v3.1.1 | 9.0216 ± 0.1590 | 42.7193 ± 0.1940 |
(Repeated 5 times per each)
Motivation
This PR takes over #4431 and #4603.
#4431
read_trials_from_remote_storagemethod in_CachedStoragedoes not sync trials owned by itself. This behavior causes problems when another worker (e.g. Optuna Dashboard) make the trial completed as follows:https://gist.github.com/c-bata/69e81a34c395d3ea368e24b0d5e0d367
#4603
This is part of the work to deprecate
_CachedStorage.When directly using
RDBStorage,_get_latest_trialis a bottleneck because it readssystem_attrsfrom the remote storage.Description of the changes
#4431
This PR changes
excluded_trial_idsfromowned_or_finished_trial_idstofinished_trial_idsonly.#4603
Getting
system_attrslazily. This change eliminates access to remote storage forsystem_attrsby samplers and pruners exceptGridSampler,SuccessiveHalvingPruner, andHyperbandPruner.