-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Open
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdocsAn issue or change related to documentationAn issue or change related to documentationtuneTune-related issuesTune-related issueswindows
Description
What happened + What you expected to happen
Reporting MLFlow metrics is incompatible with Population-Based Training. As the PB2 changes parameters during runtime, when Tune tries to report that to MLFlow, it throws an error, as MLFlow does not allow the parameters to be changed.
Traceback (most recent call last):
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\ray\tune\execution\trial_runner.py", line 819, in _wait_and_handle_event
self._on_pg_ready(next_trial)
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\ray\tune\execution\trial_runner.py", line 909, in _on_pg_ready
if not _start_trial(next_trial) and next_trial.status != Trial.ERROR:
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\ray\tune\execution\trial_runner.py", line 901, in _start_trial
self._callbacks.on_trial_start(
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\ray\tune\callback.py", line 317, in on_trial_start
callback.on_trial_start(**info)
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\ray\tune\logger\logger.py", line 135, in on_trial_start
self.log_trial_start(trial)
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\ray\air\callbacks\mlflow.py", line 118, in log_trial_start
self.mlflow_util.log_params(run_id=run_id, params_to_log=config)
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\ray\air\_internal\mlflow.py", line 280, in log_params
client.log_param(run_id=run_id, key=key, value=value)
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\mlflow\tracking\client.py", line 743, in log_param
self._tracking_client.log_param(run_id, key, value)
File "C:\Users\user\Anaconda3\envs\repro\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 248, in log_param
raise MlflowException(msg, INVALID_PARAMETER_VALUE)
mlflow.exceptions.MlflowException: Changing param values is not allowed. Param with key='rollout_fragment_length' was already logged with value='590' for run ID='3d0a25a70dcc4a6b9c96374e908b0ad8'. Attempted logging new value '4590'.
Versions / Dependencies
Ray 3.0.0
Python 3.9
Windows 10 Enterprise 20H2
Reproduction script
import os
import random
import tempfile
import uuid
from ray.air.callbacks.mlflow import MLflowLoggerCallback
from ray.tune import run, sample_from
from ray.tune.schedulers.pb2 import PB2
if __name__ == "__main__":
pb2 = PB2(
time_attr="timesteps_total",
metric="episode_reward_mean",
mode="max",
perturbation_interval=50000,
hyperparam_bounds={
"lambda": [0.9, 1.0],
"clip_param": [0.1, 0.5],
"lr": [1e-3, 1e-5],
"train_batch_size": [1000, 60000],
},
)
analysis = run(
"PPO",
scheduler=pb2,
verbose=1,
num_samples=4,
stop={"timesteps_total": 1000000},
config={
"framework": "torch",
"env": "CartPole-v0",
"log_level": "INFO",
"seed": 0,
"kl_coeff": 1.0,
"num_gpus": 0,
"horizon": 1600,
"observation_filter": "MeanStdFilter",
"model": {
"free_log_std": True,
},
"num_sgd_iter": 10,
"sgd_minibatch_size": 128,
"lambda": sample_from(lambda spec: random.uniform(0.9, 1.0)),
"clip_param": sample_from(lambda spec: random.uniform(0.1, 0.5)),
"lr": sample_from(lambda spec: random.uniform(1e-3, 1e-5)),
"train_batch_size": sample_from(lambda spec: random.randint(1000, 60000)),
},
callbacks=[
MLflowLoggerCallback(
experiment_name=str(uuid.uuid4()),
tracking_uri=f'file:{os.path.join(tempfile.gettempdir(), "mlruns")}',
)
],
)Issue Severity
Low: It annoys or frustrates me.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdocsAn issue or change related to documentationAn issue or change related to documentationtuneTune-related issuesTune-related issueswindows