Skip to content

Sync owned trials when calling study.ask and study.get_trials#4631

Merged
c-bata merged 15 commits intooptuna:masterfrom
not522:cached-storage-sync-owned-trials
May 12, 2023
Merged

Sync owned trials when calling study.ask and study.get_trials#4631
c-bata merged 15 commits intooptuna:masterfrom
not522:cached-storage-sync-owned-trials

Conversation

@not522
Copy link
Copy Markdown
Member

@not522 not522 commented Apr 24, 2023

Motivation

This PR takes over #4431 and #4603.

#4431

read_trials_from_remote_storage method in _CachedStorage does not sync trials owned by itself. This behavior causes problems when another worker (e.g. Optuna Dashboard) make the trial completed as follows:
https://gist.github.com/c-bata/69e81a34c395d3ea368e24b0d5e0d367

#4603

This is part of the work to deprecate _CachedStorage.

When directly using RDBStorage, _get_latest_trial is a bottleneck because it reads system_attrs from the remote storage.

Description of the changes

#4431

This PR changes excluded_trial_ids from owned_or_finished_trial_ids to finished_trial_ids only.

#4603

Getting system_attrs lazily. This change eliminates access to remote storage for system_attrs by samplers and pruners except GridSampler, SuccessiveHalvingPruner, and HyperbandPruner.

@github-actions github-actions bot added optuna.storages Related to the `optuna.storages` submodule. This is automatically labeled by github-actions. optuna.trial Related to the `optuna.trial` submodule. This is automatically labeled by github-actions. labels Apr 24, 2023
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 24, 2023

Codecov Report

Merging #4631 (82f4211) into master (792754e) will increase coverage by 0.03%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #4631      +/-   ##
==========================================
+ Coverage   90.89%   90.93%   +0.03%     
==========================================
  Files         184      184              
  Lines       13959    13910      -49     
==========================================
- Hits        12688    12649      -39     
+ Misses       1271     1261      -10     
Impacted Files Coverage Δ
optuna/storages/_cached_storage.py 98.25% <100.00%> (-0.47%) ⬇️
optuna/study/study.py 94.17% <100.00%> (+0.04%) ⬆️
optuna/trial/_trial.py 95.67% <100.00%> (+0.29%) ⬆️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@not522
Copy link
Copy Markdown
Member Author

not522 commented Apr 24, 2023

This PR slows down the optimization by about 15 seconds for 1,000 trials with TPESampler.
https://github.com/not522/optuna/actions/runs/4783015010/jobs/8502860477

  • master
  ============================= =========
    storage, sampler, n_trials           
  ----------------------------- ---------
      inmemory, random, 1000     697±0ms 
     inmemory, random, 10000     14.3±0s 
       inmemory, tpe, 1000       28.8±0s 
      inmemory, cmaes, 1000      2.07±0s 
       sqlite, random, 1000      35.6±0s 
        sqlite, tpe, 1000        1.21±0m 
       sqlite, cmaes, 1000       53.7±0s 
   cached_sqlite, random, 1000   36.3±0s 
     cached_sqlite, tpe, 1000    1.18±0m 
    cached_sqlite, cmaes, 1000   49.5±0s 
      journal, random, 1000      6.62±0s 
        journal, tpe, 1000       36.3±0s 
       journal, cmaes, 1000      11.6±0s 
  ============================= =========
  • PR
  ============================= =========
    storage, sampler, n_trials           
  ----------------------------- ---------
      inmemory, random, 1000     762±0ms 
     inmemory, random, 10000     15.0±0s 
       inmemory, tpe, 1000       28.7±0s 
      inmemory, cmaes, 1000      2.15±0s 
       sqlite, random, 1000      46.2±0s 
        sqlite, tpe, 1000        1.38±0m 
       sqlite, cmaes, 1000       1.20±0m 
   cached_sqlite, random, 1000   45.2±0s 
     cached_sqlite, tpe, 1000    1.42±0m
    cached_sqlite, cmaes, 1000   1.24±0m 
      journal, random, 1000      6.70±0s 
        journal, tpe, 1000       35.8±0s 
       journal, cmaes, 1000      11.5±0s 
  ============================= =========

@not522
Copy link
Copy Markdown
Member Author

not522 commented Apr 24, 2023

@c-bata @gen740 Could you review this PR?

@not522 not522 marked this pull request as ready for review April 24, 2023 07:09
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented May 7, 2023

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label May 7, 2023
not522 and others added 2 commits May 8, 2023 13:02
Co-authored-by: Gen <54583542+gen740@users.noreply.github.com>
Copy link
Copy Markdown
Member

@gen740 gen740 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM! I left a minor comment.

# _LazyTrialSystemAttrs gets attrs the first time it is needed.
# Then, we create the instance for each method, and test the first and second use.

system_attrs = optuna.trial._trial._LazyTrialSystemAttrs(trial._trial_id, storage)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pyright language server raised the issue that "_trial" is not a known member of module "optuna.trial". This is because pyright recognizes _trial as a private module, and it could not resolve it. Importing directly (from optuna.trial._trial import _LazyTrialSystemAttrs) could resolve this issue.

Although this is not a fatal issue, do you think it should be fixed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I updated it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

@not522 not522 added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label May 8, 2023
@github-actions github-actions bot removed the stale Exempt from stale bot labeling. label May 8, 2023
Copy link
Copy Markdown
Member

@gen740 gen740 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Member

@c-bata c-bata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@not522 Thank you for your pull request!
Looks almost good to me. I left one question though.

Just for your information, I found a bug in delete_study method that does not related to this PR and fixed it in #4670. I will make it ready for review after merging this PR.

self._add_trials_to_cache(study_id, [self._backend.get_trial(trial_id)])
self._studies[study_id].owned_or_finished_trial_ids.add(trial_id)
return ret
return self._backend.set_trial_state_values(trial_id, state=state, values=values)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, I compared the speed when applying the following change, but found no performance improvement. So the current logic seems to be reasonable.

Changes:

        ret = self._backend.set_trial_state_values(trial_id, state=state, values=values)
        if state.is_finished() and trial_id in self._trial_id_to_study_id_and_number:
            backend_trial = self._backend.get_trial(trial_id)
            study_id, trial_number = self._trial_id_to_study_id_and_number[trial_id]
            with self._lock:
                study = self._studies[study_id]
                study.trials[trial_number] = backend_trial
                study.finished_trial_ids.add(trial_id)
        return ret

Benchmark script:
https://gist.github.com/c-bata/fa8efc075e86a1f7e6dd6d83b85c1017

Result:

Branch n_params=5 (mean ± std) (s) n_params=30 (mean ± std) (s)
PR 4631 10.1026 ± 0.1046 35.8279 ± 0.1306
cache-set-trial-state-values 9.6567 ± 0.1974 35.8301 ± 0.1761
main 7.6167 ± 0.0959 32.8131 ± 0.1010
v3.1.1 9.0216 ± 0.1590 42.7193 ± 0.1940

(Repeated 5 times per each)

@c-bata c-bata added this to the v3.2.0 milestone May 12, 2023
@c-bata c-bata merged commit 466cddf into optuna:master May 12, 2023
@not522 not522 deleted the cached-storage-sync-owned-trials branch May 12, 2023 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. optuna.storages Related to the `optuna.storages` submodule. This is automatically labeled by github-actions. optuna.trial Related to the `optuna.trial` submodule. This is automatically labeled by github-actions.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants