Skip to content

Add storage cache#1140

Merged
toshihikoyanase merged 17 commits intooptuna:masterfrom
ytsmiling:add-storage-cache
May 21, 2020
Merged

Add storage cache#1140
toshihikoyanase merged 17 commits intooptuna:masterfrom
ytsmiling:add-storage-cache

Conversation

@ytsmiling
Copy link
Copy Markdown
Member

@ytsmiling ytsmiling commented Apr 20, 2020

Motivation

Flushing updates of trials to persistent storages on every update (including parameter suggests) can be a significant bottleneck. This PR mitigates the bottleneck by introducing a wrapper class that wraps the `RDBStorage/ class and providing the lazy-sync capability.

Description of the changes

  • Add CachedStorage, which is a wrapper of RDBStorage and provide the lazy-sync capability.
    • While CachedStorage only supports RDBStorage, it is straightforward to support other classes such as RedisStorage. Thus, CachedStorage file is placed directly under the optuna/storages class.
  • Introduce two additional public methods to RDBStorage.
    • One returns FrozenTrial when creating a new trial. Note, create_new_trial in BaseStorage only returns the trial-id.
    • The other receives a FrozenTrial and flush the updates to persistent storages.

Current Status

  • Implement CachedStorage class.
  • Add tests.
  • Add docs where necessary.

Microbench

(DB: official mysql docker image (tag:8.0.19), storage: SSD, CPU: Intel Core i7-6700K, python: 3.5.2)

  • perf:optimize-study: time taken for a single optimization

This PR

{'n_study': 1, 'n_trial': 100, 'n_param': 3}
perf:optimize-study                               0:00:10.169289
{'n_study': 1, 'n_trial': 100, 'n_param': 30}
perf:optimize-study                               0:00:47.566834

Current master

{'n_study': 1, 'n_trial': 100, 'n_param': 3}
perf:optimize-study                               0:00:22.499407
{'n_study': 1, 'n_trial': 100, 'n_param': 30}
perf:optimize-study                               0:02:33.151298

Benchmark code:

import argparse
from collections import defaultdict
from datetime import datetime
import math

import sqlalchemy
import optuna

profile_result = defaultdict(lambda: datetime.now() - datetime.now())


class Profile:
    def __init__(self, name):
        self.name = name

    def __enter__(self):
        self.start = datetime.now()

    def __exit__(self, exc_type, exc_val, exc_tb):
        profile_result[self.name] += datetime.now() - self.start


def print_profile():
    for key, value in sorted(profile_result.items(), key=lambda i: i[1],
                             reverse=True):
        print(key.ljust(50) + '{}'.format(value))


def build_objective_fun(n_param):
    def objective(trial):
        return sum([
            math.sin(trial.suggest_float('param-{}'.format(i), 0, math.pi * 2))
            for i in range(n_param)
        ])

    return objective


def define_flags(parser):
    parser.add_argument('mysql_user', type=str)
    parser.add_argument('mysql_password', type=str)
    parser.add_argument('mysql_host', type=str)
    parser.add_argument('n_study', type=int)
    parser.add_argument('n_trial', type=int)
    parser.add_argument('n_param', type=int)
    return parser


if __name__ == '__main__':
    parser = define_flags(argparse.ArgumentParser())
    args = parser.parse_args()
    print(vars(args))

    storage_str = 'mysql+pymysql://{}:{}@{}/'.format(
        args.mysql_user,
        args.mysql_password,
        args.mysql_host,
    )
    engine = sqlalchemy.create_engine(storage_str)
    sampler = optuna.samplers.TPESampler()
    conn = engine.connect()
    conn.execute("commit")
    database_str = 'profile_storage_s{}_t{}_p{}'.format(
        args.n_study, args.n_trial, args.n_param)

    try:
        conn.execute(
            "drop database {}".format(database_str))
    except:
        pass
    conn.execute("create database {}".format(
        database_str
    ))
    conn.close()

    for i in range(args.n_study):
        storage = optuna.storages.get_storage(storage_str + database_str)
        study = optuna.create_study(sampler=sampler, storage=storage)
        study_id = study.study_id
        with Profile('perf:optimize-study'):
            study.optimize(build_objective_fun(args.n_param), n_trials=args.n_trial, gc_after_trial=False)

    print_profile()

@ytsmiling ytsmiling changed the title Add storage cache [WIP] Add storage cache Apr 20, 2020
@ytsmiling
Copy link
Copy Markdown
Member Author

After fixing some bugs, I updated the microbenchmark.

{'n_study': 1, 'n_trial': 100, 'n_param': 30}
perf:optimize-study                               0:00:45.771852
{'n_study': 1, 'n_trial': 100, 'n_param': 3}
perf:optimize-study                               0:00:13.391903

@ytsmiling ytsmiling self-assigned this Apr 20, 2020
@ytsmiling ytsmiling marked this pull request as ready for review April 21, 2020 04:38
@ytsmiling ytsmiling changed the title [WIP] Add storage cache Add storage cache Apr 21, 2020
Copy link
Copy Markdown
Member

@sile sile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I took the first look and left some review comments.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Apr 27, 2020

Codecov Report

Merging #1140 into master will increase coverage by 0.62%.
The diff coverage is 95.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1140      +/-   ##
==========================================
+ Coverage   90.50%   91.13%   +0.62%     
==========================================
  Files         131      144      +13     
  Lines       11396    12628    +1232     
==========================================
+ Hits        10314    11508    +1194     
- Misses       1082     1120      +38     
Impacted Files Coverage Δ
optuna/storages/rdb/storage.py 95.35% <93.22%> (-0.87%) ⬇️
optuna/storages/cached_storage.py 93.51% <93.51%> (ø)
tests/storages_tests/rdb_tests/test_storage.py 97.41% <96.42%> (+0.34%) ⬆️
optuna/storages/__init__.py 93.75% <100.00%> (+1.44%) ⬆️
optuna/testing/storage.py 84.61% <100.00%> (+1.75%) ⬆️
tests/storages_tests/test_cached_storage.py 100.00% <100.00%> (ø)
tests/storages_tests/test_storages.py 98.89% <100.00%> (+<0.01%) ⬆️
optuna/integration/chainermn.py 75.70% <0.00%> (-1.70%) ⬇️
optuna/samplers/random.py 84.44% <0.00%> (-1.56%) ⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f1b4e3...28008cc. Read the comment docs.

@ytsmiling ytsmiling requested a review from sile April 28, 2020 03:41
Copy link
Copy Markdown
Member

@sile sile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my review comments. This PR seems almost okay but I left a comment to follow the storage specification change.

param_value_internal
)
cached_trial.distributions[param_name] = distribution
if cached_dist:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the line

self._dirty_trials.add(trial_id)

should be moved under if cached_dist: .

The logic here is a bit difficult too. Some comments, like "If cached_dist isn't available, we have to access the storage in order to check distribution compatibility." would help readers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. Indeed, the line should be moved, and the whole logic requires more readability. I'll address them.

@ytsmiling
Copy link
Copy Markdown
Member Author

This PR should not be merged before #1191.

@toshihikoyanase toshihikoyanase mentioned this pull request May 15, 2020
2 tasks
@ytsmiling ytsmiling force-pushed the add-storage-cache branch from 28008cc to 9ba2cf6 Compare May 19, 2020 19:49
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 19, 2020

Codecov Report

Merging #1140 into master will increase coverage by 0.40%.
The diff coverage is 95.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1140      +/-   ##
==========================================
+ Coverage   86.61%   87.01%   +0.40%     
==========================================
  Files          93       94       +1     
  Lines        6924     7233     +309     
==========================================
+ Hits         5997     6294     +297     
- Misses        927      939      +12     
Impacted Files Coverage Δ
optuna/storages/rdb/storage.py 95.83% <93.15%> (-0.24%) ⬇️
optuna/storages/cached_storage.py 95.83% <95.83%> (ø)
optuna/storages/__init__.py 100.00% <100.00%> (ø)
optuna/testing/storage.py 82.35% <100.00%> (+2.35%) ⬆️
optuna/integration/chainermn.py 75.70% <0.00%> (-1.70%) ⬇️
optuna/storages/base.py 68.22% <0.00%> (+11.90%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c9d868...c926774. Read the comment docs.

@ytsmiling
Copy link
Copy Markdown
Member Author

I rearranged commits and now this PR passed the updated tests #1191.
I addressed review comments and would like to ask re-reviews.

Some implementation has been changed from the latest reviews:

  • CachedStorage uses _StudyInfo, which is similar to the current implementation of InMemoryStorage.

@sile @toshihikoyanase Please review this PR when you have time.

@ytsmiling ytsmiling requested review from sile and toshihikoyanase May 19, 2020 22:00
Copy link
Copy Markdown
Member

@sile sile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ytsmiling Thanks! Now this RP seems great but I left a few very minor comments.

ytsmiling and others added 2 commits May 20, 2020 15:14
Copy link
Copy Markdown
Member

@sile sile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your swift fix.

@sile sile added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label May 20, 2020
@sile sile added this to the v1.5.0 milestone May 20, 2020
Copy link
Copy Markdown
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your great PR. I'm in the middle of the review of logic, but let me share some comments about styles.

This is just a comment, but I noticed two things:

  1. FrozenTrial is not frozen anymore. The attribute values are updated to keep the cache up-to-date. Not now, but we may rename it in the future.
  2. This PR directly manipulates SQLAlchemy queries while the previous implementation tried to capslate them in optuna/rdb/models.py. It may be inevitable to speed up storages.

Copy link
Copy Markdown
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes regarding the logic seem good to me.
I'll approve the PR after you check my comments about styles.

ytsmiling and others added 2 commits May 21, 2020 04:21
Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>
Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>
@ytsmiling
Copy link
Copy Markdown
Member Author

@toshihikoyanase Thank you for reviewing this PR.

  1. FrozenTrial is not frozen anymore. The attribute values are updated to keep the cache up-to-date. Not now, but we may rename it in the future.

Actually, the mutation of FrozenTrial was not introduced in this PR, but it has been a long history in the InMemoryStorage class.

  1. This PR directly manipulates SQLAlchemy queries while the previous implementation tried to capslate them in optuna/rdb/models.py. It may be inevitable to speed up storages.

Yes, I think we should remove methods in the models package and directly manipulate sqlalchemy. It can result in a significant performance gain (e.g. get_all_study_summaries method). From a pure performance perspective, it's even better to stop using orm and directly use SQL, but it requires much more code changes and I think we do not need to switch to SQL for now.

Copy link
Copy Markdown
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your swift actions. LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Change that does not break compatibility and not affect public interfaces, but improves performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants