Introduce UPSERT in set_trial_user_attr#5703
Conversation
UPSERT in set_trial_user_attr
|
@not522 @nabenabe0928 |
There was a problem hiding this comment.
I checked the code changes and LGTM!
import os
import time
import optuna
mysql_url = os.environ["OPTUNA_STORAGE"]
sqlite_url = "sqlite:///upsert-test.db"
study_name = "test-upsert"
sqlite_study = optuna.create_study(storage=sqlite_url, study_name=study_name)
mysql_study = optuna.create_study(storage=mysql_url, study_name=study_name)
sqlite_trial = sqlite_study.ask()
mysql_trial = mysql_study.ask()
start = time.time()
for i in range(1000):
sqlite_trial.set_user_attr("foo", i)
print(f"{sqlite_trial.user_attrs=}")
print(f"SQLite: {(time.time() - start) * 1000:.2f}[ms]")
start = time.time()
for i in range(1000):
mysql_trial.set_user_attr("foo", i)
print(f"MySQL: {(time.time() - start) * 1000:.2f}[ms]")
print(f"{mysql_trial.user_attrs=}")The time measured:
# This PR
[I 2024-10-22 15:05:07,173] A new study created in RDB with name: test-upsert
[I 2024-10-22 15:05:07,259] A new study created in RDB with name: test-upsert
sqlite_trial.user_attrs={'foo': 999}
SQLite: 5680.27[ms]
MySQL: 4420.31[ms]
mysql_trial.user_attrs={'foo': 999}
# The master branch
[I 2024-10-22 15:05:25,677] A new study created in RDB with name: test-upsert
[I 2024-10-22 15:05:25,783] A new study created in RDB with name: test-upsert
sqlite_trial.user_attrs={'foo': 999}
SQLite: 6599.58[ms]
MySQL: 4955.39[ms]
mysql_trial.user_attrs={'foo': 999}
| if attribute is None: | ||
| attribute = models.TrialUserAttributeModel( | ||
| trial_id=trial_id, key=key, value_json=json.dumps(value) | ||
| if self.engine.name == "mysql": |
There was a problem hiding this comment.
I checked the doc here to verify the code:
https://dev.mysql.com/doc/refman/8.4/en/insert-on-duplicate.html
| session.add(attribute) | ||
| session.execute(mysql_upsert_stmt) | ||
| elif self.engine.name == "sqlite": | ||
| sqlite_insert_stmt = sqlalchemy_dialects_sqlite.insert( |
There was a problem hiding this comment.
I checked the docs here to verify the code:
| models.TrialUserAttributeModel.trial_id, | ||
| models.TrialUserAttributeModel.key, | ||
| ], | ||
| set_=dict(value_json=sqlite_insert_stmt.excluded.value_json), |
There was a problem hiding this comment.
|
Note As in the last comment, this PR does not speed up for MySQL. |
@nabenabe0928 It looks that your benchmark results shows this PR speeds up |
sqlite_pr = [5725.75, 5891.18, 5837.73, 5904.36, 5686.25, 5548.71, 5537.34, 5380.70, 5567.75, 5666.13]
mysql_pr = [4411.56, 4386.89, 4507.84, 4388.80, 4434.83, 4484.78, 4477.91, 4637.17, 4672.88, 4323.81]
sqlite_master = [5801.52, 5888.66, 7761.65, 6197.31, 6175.02, 6288.58, 6482.06, 5978.97, 5893.87, 5895.68]
mysql_master = [4588.37, 4585.37, 4594.20, 5047.10, 4829.42, 4913.89, 4271.38, 4795.67, 4378.23, 4953.27]@c-bata @porink0424 |
|
In my local environment (Mac), the speed has not improved and is equivalent to master. But introducing this PR has the benefit of reducing the number of queries, so I think it is good to merge. |
|
@not522 |
Motivation
When adding or updating an attribute, the method first checks whether the attribute already exists in the trial. If it does, it updates the value; if not, it adds a new one. This process results in two SQL calls for every single attribute addition or update.
This PR addresses this issue by combining two queries into one by using
UPSERT.Description of the changes
UPSERTdiffers across databases, I implemented conditonal handling within_set_trial_user_attr_without_commit.insertfunction insqlalchemy.dialects.sqlite, which causes an error. To fix this, I have updated SQLAlchemy to v1.4.2.Verification of SQL Changes
mysql
I confirmed that the
SELECTstatement issued with each call toset_trial_user_attrhas been eliminated and integrated into theINSERT INTO ... ON DUPLICATE KEY UPDATE ...statement.Only the differences are shown below:
Before
After
sqlite
Before
After