Skip to content

Add upsert in _set_trial_attr_without_commit for PostgreSQL#6282

Merged
gen740 merged 3 commits intooptuna:masterfrom
jaikumarm:fix/postgresql-upsert-in-set_trial_system_attr
Oct 3, 2025
Merged

Add upsert in _set_trial_attr_without_commit for PostgreSQL#6282
gen740 merged 3 commits intooptuna:masterfrom
jaikumarm:fix/postgresql-upsert-in-set_trial_system_attr

Conversation

@jaikumarm
Copy link
Copy Markdown
Contributor

Motivation

This PR adds upsert in _set_trial_attr_without_commit for postgresql. It is an expansion of #5741 and #5703. These two PR's added upsert for sqlite and mysql for trial attrs, but postgresql was marked as TODO, this adds support for postgresql.

Description of the changes

From my research it seems sqlalchemy already has the upsert function on_conflict_do_update similar to sqlite and mysql. I just had to add import dialects and a conditional call for postgres.

I checked sqlalchemy docs it looks like 1.4.2 supports postgresql on_conflict_do_update, so don't think we need to change any dependencies.
https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#sqlalchemy.dialects.postgresql.Insert.on_conflict_do_update

Verification of SQL changes

I followed the same verification example provided in #5741 and #5703 for verification

Code

import optuna
import logging

optuna.logging.set_verbosity(optuna.logging.WARNING)

postgresql_url = os.environ["OPTUNA_PG_TEST_STORAGE"]

study = optuna.create_study(storage=postgresql_url)
storage = study._storage


def objective(trial: optuna.Trial) -> float:
    storage.set_trial_system_attr(trial._trial_id, "system_attr_1", "value_1")

    logging.basicConfig()
    logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)
    storage.set_trial_system_attr(trial._trial_id, "system_attr_1", "new_value_1")
    storage.set_trial_system_attr(trial._trial_id, "system_attr_2", "value_2")
    logging.getLogger("sqlalchemy.engine").setLevel(logging.WARNING)

    return trial.suggest_float("x", 0.0, 1.0) ** 2


study.optimize(objective, n_trials=1)

Output

Before

INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete 
FROM trials 
WHERE trials.trial_id = %(trial_id_1)s
INFO:sqlalchemy.engine.Engine:[cached since 0.02755s ago] {'trial_id_1': 3}
INFO:sqlalchemy.engine.Engine:SELECT trial_system_attributes.trial_system_attribute_id AS trial_system_attributes_trial_system_attribute_id, trial_system_attributes.trial_id AS trial_system_attributes_trial_id, trial_system_attributes.key AS trial_system_attributes_key, trial_system_attributes.value_json AS trial_system_attributes_value_json 
FROM trial_system_attributes 
WHERE trial_system_attributes.trial_id = %(trial_id_1)s AND trial_system_attributes.key = %(key_1)s
INFO:sqlalchemy.engine.Engine:[cached since 0.006932s ago] {'trial_id_1': 3, 'key_1': 'system_attr_1'}
INFO:sqlalchemy.engine.Engine:UPDATE trial_system_attributes SET value_json=%(value_json)s WHERE trial_system_attributes.trial_system_attribute_id = %(trial_system_attributes_trial_system_attribute_id)s
INFO:sqlalchemy.engine.Engine:[generated in 0.00016s] {'value_json': '"new_value_1"', 'trial_system_attributes_trial_system_attribute_id': 7}
INFO:sqlalchemy.engine.Engine:COMMIT
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete 
FROM trials 
WHERE trials.trial_id = %(trial_id_1)s
INFO:sqlalchemy.engine.Engine:[cached since 0.03367s ago] {'trial_id_1': 3}
INFO:sqlalchemy.engine.Engine:SELECT trial_system_attributes.trial_system_attribute_id AS trial_system_attributes_trial_system_attribute_id, trial_system_attributes.trial_id AS trial_system_attributes_trial_id, trial_system_attributes.key AS trial_system_attributes_key, trial_system_attributes.value_json AS trial_system_attributes_value_json 
FROM trial_system_attributes 
WHERE trial_system_attributes.trial_id = %(trial_id_1)s AND trial_system_attributes.key = %(key_1)s
INFO:sqlalchemy.engine.Engine:[cached since 0.01297s ago] {'trial_id_1': 3, 'key_1': 'system_attr_2'}
INFO:sqlalchemy.engine.Engine:INSERT INTO trial_system_attributes (trial_id, key, value_json) VALUES (%(trial_id)s, %(key)s, %(value_json)s) RETURNING trial_system_attributes.trial_system_attribute_id
INFO:sqlalchemy.engine.Engine:[cached since 0.01201s ago] {'trial_id': 3, 'key': 'system_attr_2', 'value_json': '"value_2"'}
INFO:sqlalchemy.engine.Engine:COMMIT

After

INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete 
FROM trials 
WHERE trials.trial_id = %(trial_id_1)s
INFO:sqlalchemy.engine.Engine:[cached since 0.1034s ago] {'trial_id_1': 1}
INFO:sqlalchemy.engine.Engine:INSERT INTO trial_system_attributes (trial_id, key, value_json) VALUES (%(trial_id)s, %(key)s, %(value_json)s) ON CONFLICT (trial_id, key) DO UPDATE SET value_json = excluded.value_json RETURNING trial_system_attributes.trial_system_attribute_id
INFO:sqlalchemy.engine.Engine:[no key 0.00019s] {'trial_id': 1, 'key': 'system_attr_1', 'value_json': '"new_value_1"'}
INFO:sqlalchemy.engine.Engine:COMMIT
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete 
FROM trials 
WHERE trials.trial_id = %(trial_id_1)s
INFO:sqlalchemy.engine.Engine:[cached since 0.1089s ago] {'trial_id_1': 1}
INFO:sqlalchemy.engine.Engine:INSERT INTO trial_system_attributes (trial_id, key, value_json) VALUES (%(trial_id)s, %(key)s, %(value_json)s) ON CONFLICT (trial_id, key) DO UPDATE SET value_json = excluded.value_json RETURNING trial_system_attributes.trial_system_attribute_id
INFO:sqlalchemy.engine.Engine:[no key 0.00019s] {'trial_id': 1, 'key': 'system_attr_2', 'value_json': '"value_2"'}
INFO:sqlalchemy.engine.Engine:COMMIT

Performance Check

Code

import os
import time

import optuna


postgresql_url = os.environ["OPTUNA_PG_TEST_STORAGE"]
sqlite_url = "sqlite:///test-pg-upsert.db"
study_name = "test-pg-upsert"

sqlite_study = optuna.create_study(storage=sqlite_url, study_name=study_name)
postgresql_study = optuna.create_study(storage=postgresql_url, study_name=study_name)

sqlite_trial = sqlite_study.ask()
postgresql_trial = postgresql_study.ask()

start = time.time()
for i in range(1000):
    sqlite_trial.set_user_attr("foo", i)
print(f"{sqlite_trial.user_attrs=}")
print(f"SQLite: {(time.time() - start) * 1000:.2f}[ms]")

start = time.time()
for i in range(1000):
    postgresql_trial.set_user_attr("foo", i)
print(f"PostgreSQL: {(time.time() - start) * 1000:.2f}[ms]")
print(f"{postgresql_trial.user_attrs=}")

Output

Release 4.5.0

[I 2025-09-18 10:36:20,446] A new study created in RDB with name: test-pg-upsert
[I 2025-09-18 10:36:20,842] A new study created in RDB with name: test-pg-upsert
sqlite_trial.user_attrs={'foo': 999}
SQLite: 3091.52[ms]
PostgreSQL: 6331.63[ms]
postgresql_trial.user_attrs={'foo': 999}

Master 4.6.0.dev0 with fix

[I 2025-09-18 10:41:08,638] A new study created in RDB with name: test-pg-upsert
[I 2025-09-18 10:41:08,882] A new study created in RDB with name: test-pg-upsert
sqlite_trial.user_attrs={'foo': 999}
SQLite: 2858.08[ms]
PostgreSQL: 3689.49[ms]
postgresql_trial.user_attrs={'foo': 999}

this is my first PR for optuna, so not sure If it has everything thats needed, if not please let me know.

@nabenabe0928 nabenabe0928 changed the title adding upsert in _set_trial_attr_without_commit for postgresql Add upsert in _set_trial_attr_without_commit for PostgreSQL Sep 19, 2025
@c-bata c-bata added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label Sep 19, 2025
@c-bata
Copy link
Copy Markdown
Member

c-bata commented Sep 19, 2025

@kAIto47802 Could you review this PR?

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Sep 28, 2025
@c-bata c-bata removed the stale Exempt from stale bot labeling. label Sep 29, 2025
@y0z
Copy link
Copy Markdown
Member

y0z commented Oct 2, 2025

@kAIto47802, @c-bata
Let me reassign this PR, as you don't have time to review it.

@nabenabe0928 @not522, could you review this PR?

@y0z y0z assigned not522 and nabenabe0928 and unassigned c-bata and kAIto47802 Oct 2, 2025
Copy link
Copy Markdown
Member

@not522 not522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response. It's almost LGTM. Could you update the TODO comment?

Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>
Copy link
Copy Markdown
Member

@not522 not522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! LGTM!

@not522 not522 removed their assignment Oct 3, 2025
@gen740 gen740 assigned gen740 and unassigned nabenabe0928 Oct 3, 2025
Copy link
Copy Markdown
Member

@gen740 gen740 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

TBH, in my environment (MacOS 26 Tahoe, MacBook Pro M3 Max), there is not much difference in performance between master and this PR.

This PR

Benchmark 1: python3 pr6282.py
  Time (mean ± σ):     727.3 ms ±  11.3 ms    [User: 590.3 ms, System: 58.1 ms]
  Range (min … max):   715.5 ms … 745.9 ms    10 runs

Master

Benchmark 1: python3 pr6282.py
  Time (mean ± σ):     745.4 ms ±  18.8 ms    [User: 581.6 ms, System: 63.4 ms]
  Range (min … max):   710.2 ms … 782.1 ms    10 runs

@gen740 gen740 added this to the v4.6.0 milestone Oct 3, 2025
@gen740 gen740 merged commit b7e92b4 into optuna:master Oct 3, 2025
14 checks passed
@gen740 gen740 removed their assignment Oct 3, 2025
@not522
Copy link
Copy Markdown
Member

not522 commented Oct 3, 2025

Note: I confirmed that this PR ensures faster performance if value updates occur. (8.02s -> 7.87s)

import optuna

postgresql_url = "postgresql+psycopg2://user:test@127.0.0.1/optunatest"
study = optuna.create_study(storage=postgresql_url, study_name="test")
storage = study._storage

def objective(trial: optuna.Trial) -> float:
    storage.set_trial_system_attr(trial._trial_id, "system_attr_1", "value_1")
    storage.set_trial_system_attr(trial._trial_id, "system_attr_1", "new_value_1")
    storage.set_trial_system_attr(trial._trial_id, "system_attr_2", "value_2")
    return trial.suggest_float("x", 0.0, 1.0) ** 2

study.optimize(objective, n_trials=1000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Change that does not break compatibility and not affect public interfaces, but improves performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants