Set precision of sqlalchemy.Float in RDBStorage table definition#3327
Conversation
|
This pull request has not seen any recent activity. |
|
I'm still working on this PR. |
Codecov Report
@@ Coverage Diff @@
## master #3327 +/- ##
==========================================
- Coverage 91.48% 91.47% -0.02%
==========================================
Files 158 159 +1
Lines 12241 12270 +29
==========================================
+ Hits 11199 11224 +25
- Misses 1042 1046 +4
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
|
This pull request has not seen any recent activity. |
|
[memo] [at]c-bata suggested to use filter warning in a migration script, which avoids to show deprecation messages to users. |
bc74f1f to
8ea3644
Compare
| from sqlalchemy import pool | ||
|
|
||
| import optuna | ||
| import optuna.storages._rdb.models |
There was a problem hiding this comment.
This change was required to avoid the following import error.
$ alembic revision --autogenerate --rev-id v3.0.0.b
Traceback (most recent call last):
File "/Users/yanase/pfn/code/optuna/.venv/bin/alembic", line 8, in <module>
sys.exit(main())
File "/Users/yanase/pfn/code/optuna/.venv/lib/python3.9/site-packages/alembic/config.py", line 588, in main
CommandLine(prog=prog).main(argv=argv)
File "/Users/yanase/pfn/code/optuna/.venv/lib/python3.9/site-packages/alembic/config.py", line 582, in main
self.run_cmd(cfg, options)
File "/Users/yanase/pfn/code/optuna/.venv/lib/python3.9/site-packages/alembic/config.py", line 559, in run_cmd
fn(
File "/Users/yanase/pfn/code/optuna/.venv/lib/python3.9/site-packages/alembic/command.py", line 227, in revision
script_directory.run_env()
File "/Users/yanase/pfn/code/optuna/.venv/lib/python3.9/site-packages/alembic/script/base.py", line 563, in run_env
util.load_python_file(self.dir, "env.py")
File "/Users/yanase/pfn/code/optuna/.venv/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
module = load_module_py(module_id, path)
File "/Users/yanase/pfn/code/optuna/.venv/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
spec.loader.exec_module(module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "alembic/env.py", line 25, in <module>
target_metadata = optuna.storages._rdb.models.BaseModel.metadata
AttributeError: module 'optuna.storages._rdb' has no attribute 'models'
|
|
I confirmed the change by using @himkt 's optuna-2e. The corresponding branch is this one. To dump DB data, please use the following commands: |
|
@c-bata @contramundum53 Could you review this PR, please? |
|
I guess this means we should relax following limits to actually store double precision values. optuna/optuna/storages/_rdb/storage.py Lines 59 to 60 in f00abac Ref #3238 (comment) |
|
@xadrianzetx Yes, you are correct 👍 Actually I'm planning to fix the problem you pointed. |
|
Yeah, that's what I had in mind. My first solution was a bit too hacky. |
There was a problem hiding this comment.
Thank you for your pull request. LGTM with nits!
It might be a bit controversial that this PR also contains a schema migration for #3348. I think it's acceptable since those schema changes are executed in one query like the following.
SQL queries of forward migration
2022-05-10T05:12:41.383794Z 26 Query ALTER TABLE trial_intermediate_values MODIFY intermediate_value FLOAT(53) NULL
2022-05-10T05:12:41.456748Z 26 Query ALTER TABLE trial_params MODIFY param_value FLOAT(53) NULL
2022-05-10T05:12:41.532337Z 26 Query ALTER TABLE trial_values MODIFY value FLOAT(53) NOT NULL
2022-05-10T05:12:41.605596Z 26 Query UPDATE alembic_version SET version_num='v3.0.0.b' WHERE alembic_version.version_num = 'v3.0.0.a'
2022-05-10T05:12:41.608029Z 26 Query COMMIT
SQL queries of backward migration
2022-05-10T05:11:38.376333Z 25 Query ALTER TABLE trial_intermediate_values MODIFY intermediate_value FLOAT NOT NULL
2022-05-10T05:11:38.466837Z 25 Query ALTER TABLE trial_params MODIFY param_value FLOAT NULL
2022-05-10T05:11:38.539438Z 25 Query ALTER TABLE trial_values MODIFY value FLOAT NOT NULL
2022-05-10T05:11:38.613731Z 25 Query UPDATE alembic_version SET version_num='v3.0.0.a' WHERE alembic_version.version_num = 'v3.0.0.b'
2022-05-10T05:11:38.616350Z 25 Query COMMIT
Here is a note about how table schemas are changed on SQLite3, MySQL, and PostgreSQL.
SQLite3
In SQLite3, the only data type that stores floating point numbers is REAL. REAL is a double precision floating point number, so the precision does not change before or after this PR.
Before running migration
sqlite> .schema trial_intermediate_values
CREATE TABLE trial_intermediate_values (
trial_intermediate_value_id INTEGER NOT NULL,
trial_id INTEGER NOT NULL,
step INTEGER NOT NULL,
intermediate_value FLOAT NOT NULL,
PRIMARY KEY (trial_intermediate_value_id),
UNIQUE (trial_id, step),
FOREIGN KEY(trial_id) REFERENCES trials (trial_id)
);
sqlite> .schema trial_values
CREATE TABLE trial_values (
trial_value_id INTEGER NOT NULL,
trial_id INTEGER NOT NULL,
objective INTEGER NOT NULL,
value FLOAT NOT NULL,
PRIMARY KEY (trial_value_id),
UNIQUE (trial_id, objective),
FOREIGN KEY(trial_id) REFERENCES trials (trial_id)
);
sqlite> .schema trial_params
CREATE TABLE trial_params (
param_id INTEGER NOT NULL,
trial_id INTEGER,
param_name VARCHAR(512),
param_value FLOAT,
distribution_json TEXT,
PRIMARY KEY (param_id),
UNIQUE (trial_id, param_name),
FOREIGN KEY(trial_id) REFERENCES trials (trial_id)
);
After running migration
sqlite> .schema trial_intermediate_values
CREATE TABLE IF NOT EXISTS "trial_intermediate_values" (
trial_intermediate_value_id INTEGER NOT NULL,
trial_id INTEGER NOT NULL,
step INTEGER NOT NULL,
intermediate_value FLOAT,
PRIMARY KEY (trial_intermediate_value_id),
FOREIGN KEY(trial_id) REFERENCES trials (trial_id),
UNIQUE (trial_id, step)
);
sqlite> .schema trial_values
CREATE TABLE IF NOT EXISTS "trial_values" (
trial_value_id INTEGER NOT NULL,
trial_id INTEGER NOT NULL,
objective INTEGER NOT NULL,
value FLOAT NOT NULL,
PRIMARY KEY (trial_value_id),
FOREIGN KEY(trial_id) REFERENCES trials (trial_id),
UNIQUE (trial_id, objective)
);
sqlite> .schema trial_params
CREATE TABLE IF NOT EXISTS "trial_params" (
param_id INTEGER NOT NULL,
trial_id INTEGER,
param_name VARCHAR(512),
param_value FLOAT,
distribution_json TEXT,
PRIMARY KEY (param_id),
FOREIGN KEY(trial_id) REFERENCES trials (trial_id),
UNIQUE (trial_id, param_name)
);
PostgreSQL
In PostgreSQL, real is a 4 bytes floating point data type and double precision is a 8 bytes floating point data type. SQLAlchemy's Float field actually uses double precision data type by default. so the precision does not change before or after this PR.
Before running migration
optuna=# \d trial_params
Table "public.trial_params"
Column | Type | Collation | Nullable | Default
-------------------+------------------------+-----------+----------+------------------------------------------------
param_id | integer | | not null | nextval('trial_params_param_id_seq'::regclass)
trial_id | integer | | |
param_name | character varying(512) | | |
param_value | double precision | | |
distribution_json | text | | |
Indexes:
"trial_params_pkey" PRIMARY KEY, btree (param_id)
"trial_params_trial_id_param_name_key" UNIQUE CONSTRAINT, btree (trial_id, param_name)
Foreign-key constraints:
"trial_params_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
optuna=# \d trial_values
Table "public.trial_values"
Column | Type | Collation | Nullable | Default
----------------+------------------+-----------+----------+------------------------------------------------------
trial_value_id | integer | | not null | nextval('trial_values_trial_value_id_seq'::regclass)
trial_id | integer | | not null |
objective | integer | | not null |
value | double precision | | not null |
Indexes:
"trial_values_pkey" PRIMARY KEY, btree (trial_value_id)
"trial_values_trial_id_objective_key" UNIQUE CONSTRAINT, btree (trial_id, objective)
Foreign-key constraints:
"trial_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
optuna=# \d trial_intermediate_values
Table "public.trial_intermediate_values"
Column | Type | Collation | Nullable | Default
-----------------------------+------------------+-----------+----------+--------------------------------------------------------------------------------
trial_intermediate_value_id | integer | | not null | nextval('trial_intermediate_values_trial_intermediate_value_id_seq'::regclass)
trial_id | integer | | not null |
step | integer | | not null |
intermediate_value | double precision | | not null |
Indexes:
"trial_intermediate_values_pkey" PRIMARY KEY, btree (trial_intermediate_value_id)
"trial_intermediate_values_trial_id_step_key" UNIQUE CONSTRAINT, btree (trial_id, step)
Foreign-key constraints:
"trial_intermediate_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id
After running migration
optuna=# \d trial_params
Table "public.trial_params"
Column | Type | Collation | Nullable | Default
-------------------+------------------------+-----------+----------+------------------------------------------------
param_id | integer | | not null | nextval('trial_params_param_id_seq'::regclass)
trial_id | integer | | |
param_name | character varying(512) | | |
param_value | double precision | | |
distribution_json | text | | |
Indexes:
"trial_params_pkey" PRIMARY KEY, btree (param_id)
"trial_params_trial_id_param_name_key" UNIQUE CONSTRAINT, btree (trial_id, param_name)
Foreign-key constraints:
"trial_params_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
optuna=# \d trial_intermediate_values
Table "public.trial_intermediate_values"
Column | Type | Collation | Nullable | Default
-----------------------------+------------------+-----------+----------+--------------------------------------------------------------------------------
trial_intermediate_value_id | integer | | not null | nextval('trial_intermediate_values_trial_intermediate_value_id_seq'::regclass)
trial_id | integer | | not null |
step | integer | | not null |
intermediate_value | double precision | | |
Indexes:
"trial_intermediate_values_pkey" PRIMARY KEY, btree (trial_intermediate_value_id)
"trial_intermediate_values_trial_id_step_key" UNIQUE CONSTRAINT, btree (trial_id, step)
Foreign-key constraints:
"trial_intermediate_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
optuna=# \d trial_values
Table "public.trial_values"
Column | Type | Collation | Nullable | Default
----------------+------------------+-----------+----------+------------------------------------------------------
trial_value_id | integer | | not null | nextval('trial_values_trial_value_id_seq'::regclass)
trial_id | integer | | not null |
objective | integer | | not null |
value | double precision | | not null |
Indexes:
"trial_values_pkey" PRIMARY KEY, btree (trial_value_id)
"trial_values_trial_id_objective_key" UNIQUE CONSTRAINT, btree (trial_id, objective)
Foreign-key constraints:
"trial_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
MySQL
In MySQL, we can specify precision of floating point number like FLOAT(p). According to a MySQL documentation, a precision from 24 to 53 results in an 8-byte double-precision DOUBLE column. 53 seems to be reasonable precision number.
I confirmed that the precision of each columns are changed from FLOAT to DOUBLE. Please see the following schemas for details.
Before running migration
mysql> desc trial_intermediate_values;
+-----------------------------+-------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+-------+------+-----+---------+----------------+
| trial_intermediate_value_id | int | NO | PRI | NULL | auto_increment |
| trial_id | int | NO | MUL | NULL | |
| step | int | NO | | NULL | |
| intermediate_value | float | NO | | NULL | |
+-----------------------------+-------+------+-----+---------+----------------+
4 rows in set (0.04 sec)
mysql> desc trial_values;
+----------------+-------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------+------+-----+---------+----------------+
| trial_value_id | int | NO | PRI | NULL | auto_increment |
| trial_id | int | NO | MUL | NULL | |
| objective | int | NO | | NULL | |
| value | float | NO | | NULL | |
+----------------+-------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> desc trial_params;
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| param_id | int | NO | PRI | NULL | auto_increment |
| trial_id | int | YES | MUL | NULL | |
| param_name | varchar(512) | YES | | NULL | |
| param_value | float | YES | | NULL | |
| distribution_json | text | YES | | NULL | |
+-------------------+--------------+------+-----+---------+----------------+
5 rows in set (0.03 sec)
After running migration
mysql> desc trial_intermediate_values;
+-----------------------------+--------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+--------+------+-----+---------+----------------+
| trial_intermediate_value_id | int | NO | PRI | NULL | auto_increment |
| trial_id | int | NO | MUL | NULL | |
| step | int | NO | | NULL | |
| intermediate_value | double | YES | | NULL | |
+-----------------------------+--------+------+-----+---------+----------------+
4 rows in set (0.01 sec)
mysql> desc trial_values;
+----------------+--------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------+------+-----+---------+----------------+
| trial_value_id | int | NO | PRI | NULL | auto_increment |
| trial_id | int | NO | MUL | NULL | |
| objective | int | NO | | NULL | |
| value | double | NO | | NULL | |
+----------------+--------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> desc trial_params;
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| param_id | int | NO | PRI | NULL | auto_increment |
| trial_id | int | YES | MUL | NULL | |
| param_name | varchar(512) | YES | | NULL | |
| param_value | double | YES | | NULL | |
| distribution_json | text | YES | | NULL | |
+-------------------+--------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)
Co-authored-by: Masashi Shibata <c-bata@users.noreply.github.com>
| with op.batch_alter_table("trial_params") as batch_op: | ||
| batch_op.alter_column( | ||
| "param_value", | ||
| type_=Float(precision=FLOAT_PRECISION), | ||
| existing_nullable=True, | ||
| ) |
There was a problem hiding this comment.
Do you need param_value to be nullable?
There was a problem hiding this comment.
I'm sorry but I couldn't understand your question. I didn't change the nullable setting of param_value, and I set existing_nullable=True.
Do you mean we need to replace existing_nullable=True with nullable=True?
|
Otherwise this looks good to me. I couldn't test this change on mssql, but for other databases I confirmed that the migration works as intended. |
Motivation
Related to #3227.
I found pandas set the
precisionoption ofsqlalchemy.Floatto 53 in .https://github.com/pandas-dev/pandas/blob/af8ad6d8b0db8ea98dab98ae77a48092ba3832f8/pandas/io/sql.py#L1149-L1186. This change will change the column type of the trial value fromfloattodoublewhen I confirmed the table definition in MySQL 5.8.I guess the value
53comes from IEEE 754, but I'm not sure.Current master
This PR
Description of the changes
precision=53tosqlalchemy.Floatintermediate_valueoftrial_intermediate_valuestableparam_valueoftrial_paramstablevalueoftrial_valuestablenullableofintermediate_valueoftrial_intermediate_valuestablenanintrial.report#3348Data types of the other RDB
SQLite3
PostgreSQL
TODO