Skip to content

Issues for reproducing DDPG in 0.8.0.dev1 #4972

@wsjeon

Description

@wsjeon

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Mojave
  • Ray installed from (source or binary): binary
  • Ray version: 0.8.0.dev1
  • Python version: 3.6.7
  • Exact command to reproduce:
$ rllib train -f tuned_examples/pendulum-ddpg.yaml
$ rllib train -f tuned_examples/mountaincarcontinuous-ddpg.yaml

Describe the problem

Hi. I got some problems with reproducing DDPG for simple continuous control tasks. I think the problem is due to this line, which seems not to be supported at 0.8.0.dev1.

Source code / logs

WARNING: Logging before flag parsing goes to stderr.
W0613 00:16:11.083568 4554786240 deprecation.py:323] From /anaconda3/envs/marl-rllib/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:61: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
{'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
{'optimizer': {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}, 'n_step': 3, 'num_gpus': 1, 'num_workers': 32, 'buffer_size': 2000000, 'learning_starts': 50000, 'train_batch_size': 512, 'sample_batch_size': 50, 'target_network_update_freq': 500000, 'timesteps_per_iteration': 25000, 'per_worker_exploration': True, 'worker_side_prioritization': True, 'min_iter_time_s': 30}
dict_items([('optimizer', {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}), ('n_step', 3), ('num_gpus', 1), ('num_workers', 32), ('buffer_size', 2000000), ('learning_starts', 50000), ('train_batch_size', 512), ('sample_batch_size', 50), ('target_network_update_freq', 500000), ('timesteps_per_iteration', 25000), ('per_worker_exploration', True), ('worker_side_prioritization', True), ('min_iter_time_s', 30)])
{'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
{'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
{'optimizer': {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}, 'n_step': 3, 'num_gpus': 0, 'num_workers': 32, 'buffer_size': 2000000, 'learning_starts': 50000, 'train_batch_size': 512, 'sample_batch_size': 50, 'target_network_update_freq': 500000, 'timesteps_per_iteration': 25000, 'per_worker_exploration': True, 'worker_side_prioritization': True, 'min_iter_time_s': 30}
dict_items([('optimizer', {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}), ('n_step', 3), ('num_gpus', 0), ('num_workers', 32), ('buffer_size', 2000000), ('learning_starts', 50000), ('train_batch_size', 512), ('sample_batch_size', 50), ('target_network_update_freq', 500000), ('timesteps_per_iteration', 25000), ('per_worker_exploration', True), ('worker_side_prioritization', True), ('min_iter_time_s', 30)])
{'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
{'twin_q': True, 'policy_delay': 2, 'smooth_target_policy': True, 'target_noise': 0.2, 'target_noise_clip': 0.5, 'exploration_should_anneal': False, 'exploration_noise_type': 'gaussian', 'exploration_gaussian_sigma': 0.1, 'learning_starts': 10000, 'pure_exploration_steps': 10000, 'actor_hiddens': [400, 300], 'critic_hiddens': [400, 300], 'n_step': 1, 'gamma': 0.99, 'actor_lr': 0.001, 'critic_lr': 0.001, 'l2_reg': 0.0, 'tau': 0.005, 'train_batch_size': 100, 'use_huber': False, 'target_network_update_freq': 0, 'num_workers': 0, 'num_gpus_per_worker': 0, 'per_worker_exploration': False, 'worker_side_prioritization': False, 'buffer_size': 1000000, 'prioritized_replay': False, 'clip_rewards': False, 'use_state_preprocessor': False}
dict_items([('twin_q', True), ('policy_delay', 2), ('smooth_target_policy', True), ('target_noise', 0.2), ('target_noise_clip', 0.5), ('exploration_should_anneal', False), ('exploration_noise_type', 'gaussian'), ('exploration_gaussian_sigma', 0.1), ('learning_starts', 10000), ('pure_exploration_steps', 10000), ('actor_hiddens', [400, 300]), ('critic_hiddens', [400, 300]), ('n_step', 1), ('gamma', 0.99), ('actor_lr', 0.001), ('critic_lr', 0.001), ('l2_reg', 0.0), ('tau', 0.005), ('train_batch_size', 100), ('use_huber', False), ('target_network_update_freq', 0), ('num_workers', 0), ('num_gpus_per_worker', 0), ('per_worker_exploration', False), ('worker_side_prioritization', False), ('buffer_size', 1000000), ('prioritized_replay', False), ('clip_rewards', False), ('use_state_preprocessor', False)])
{'sample_batch_size': 20, 'min_iter_time_s': 10, 'sample_async': False}
dict_items([('sample_batch_size', 20), ('min_iter_time_s', 10), ('sample_async', False)])
/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/rllib/train.py:100: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  experiments = yaml.load(f)
2019-06-13 00:16:11,618	WARNING worker.py:1340 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2019-06-13 00:16:11,620	INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-13_00-16-11_619077_10206/logs.
2019-06-13 00:16:11,728	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:48724 to respond...
2019-06-13 00:16:11,842	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:33113 to respond...
2019-06-13 00:16:11,845	INFO services.py:806 -- Starting Redis shard with 3.44 GB max memory.
2019-06-13 00:16:11,859	INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-13_00-16-11_619077_10206/logs.
2019-06-13 00:16:11,860	INFO services.py:1442 -- Starting the Plasma object store with 5.15 GB memory using /tmp.
2019-06-13 00:16:12,412	INFO tune.py:61 -- Tip: to resume incomplete experiments, pass resume='prompt' or resume=True to run()
2019-06-13 00:16:12,413	INFO tune.py:232 -- Starting a new experiment.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs
Memory usage on this node: 11.4/17.2 GB

2019-06-13 00:16:12,449	WARNING signature.py:108 -- The function with_updates has a **kwargs argument, which is currently not supported.
W0613 00:16:12.453155 4554786240 deprecation_wrapper.py:119] From /anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/logger.py:136: The name tf.VERSION is deprecated. Please use tf.version.VERSION instead.

W0613 00:16:12.453705 4554786240 deprecation_wrapper.py:119] From /anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/logger.py:141: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 1/12 CPUs, 0/0 GPUs
Memory usage on this node: 11.4/17.2 GB
Result logdir: /Users/wsjeon/ray_results/pendulum-ddpg
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - DDPG_Pendulum-v0_0:	RUNNING

(pid=10223) WARNING: Logging before flag parsing goes to stderr.
(pid=10223) W0613 00:16:13.760989 4569867712 deprecation.py:323] From /anaconda3/envs/marl-rllib/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:61: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=10223) Instructions for updating:
(pid=10223) non-resource variables are not supported in the long term
(pid=10223) {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
(pid=10223) dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
(pid=10223) {'optimizer': {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}, 'n_step': 3, 'num_gpus': 1, 'num_workers': 32, 'buffer_size': 2000000, 'learning_starts': 50000, 'train_batch_size': 512, 'sample_batch_size': 50, 'target_network_update_freq': 500000, 'timesteps_per_iteration': 25000, 'per_worker_exploration': True, 'worker_side_prioritization': True, 'min_iter_time_s': 30}
(pid=10223) dict_items([('optimizer', {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}), ('n_step', 3), ('num_gpus', 1), ('num_workers', 32), ('buffer_size', 2000000), ('learning_starts', 50000), ('train_batch_size', 512), ('sample_batch_size', 50), ('target_network_update_freq', 500000), ('timesteps_per_iteration', 25000), ('per_worker_exploration', True), ('worker_side_prioritization', True), ('min_iter_time_s', 30)])
(pid=10223) {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
(pid=10223) dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
(pid=10223) {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
(pid=10223) dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
(pid=10223) {'optimizer': {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}, 'n_step': 3, 'num_gpus': 0, 'num_workers': 32, 'buffer_size': 2000000, 'learning_starts': 50000, 'train_batch_size': 512, 'sample_batch_size': 50, 'target_network_update_freq': 500000, 'timesteps_per_iteration': 25000, 'per_worker_exploration': True, 'worker_side_prioritization': True, 'min_iter_time_s': 30}
(pid=10223) dict_items([('optimizer', {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}), ('n_step', 3), ('num_gpus', 0), ('num_workers', 32), ('buffer_size', 2000000), ('learning_starts', 50000), ('train_batch_size', 512), ('sample_batch_size', 50), ('target_network_update_freq', 500000), ('timesteps_per_iteration', 25000), ('per_worker_exploration', True), ('worker_side_prioritization', True), ('min_iter_time_s', 30)])
(pid=10223) {'max_weight_sync_delay': 400, 'num_replay_buffer_shards': 4, 'debug': False}
(pid=10223) dict_items([('max_weight_sync_delay', 400), ('num_replay_buffer_shards', 4), ('debug', False)])
(pid=10223) {'twin_q': True, 'policy_delay': 2, 'smooth_target_policy': True, 'target_noise': 0.2, 'target_noise_clip': 0.5, 'exploration_should_anneal': False, 'exploration_noise_type': 'gaussian', 'exploration_gaussian_sigma': 0.1, 'learning_starts': 10000, 'pure_exploration_steps': 10000, 'actor_hiddens': [400, 300], 'critic_hiddens': [400, 300], 'n_step': 1, 'gamma': 0.99, 'actor_lr': 0.001, 'critic_lr': 0.001, 'l2_reg': 0.0, 'tau': 0.005, 'train_batch_size': 100, 'use_huber': False, 'target_network_update_freq': 0, 'num_workers': 0, 'num_gpus_per_worker': 0, 'per_worker_exploration': False, 'worker_side_prioritization': False, 'buffer_size': 1000000, 'prioritized_replay': False, 'clip_rewards': False, 'use_state_preprocessor': False}
(pid=10223) dict_items([('twin_q', True), ('policy_delay', 2), ('smooth_target_policy', True), ('target_noise', 0.2), ('target_noise_clip', 0.5), ('exploration_should_anneal', False), ('exploration_noise_type', 'gaussian'), ('exploration_gaussian_sigma', 0.1), ('learning_starts', 10000), ('pure_exploration_steps', 10000), ('actor_hiddens', [400, 300]), ('critic_hiddens', [400, 300]), ('n_step', 1), ('gamma', 0.99), ('actor_lr', 0.001), ('critic_lr', 0.001), ('l2_reg', 0.0), ('tau', 0.005), ('train_batch_size', 100), ('use_huber', False), ('target_network_update_freq', 0), ('num_workers', 0), ('num_gpus_per_worker', 0), ('per_worker_exploration', False), ('worker_side_prioritization', False), ('buffer_size', 1000000), ('prioritized_replay', False), ('clip_rewards', False), ('use_state_preprocessor', False)])
(pid=10223) {'sample_batch_size': 20, 'min_iter_time_s': 10, 'sample_async': False}
(pid=10223) dict_items([('sample_batch_size', 20), ('min_iter_time_s', 10), ('sample_async', False)])
2019-06-13 00:16:14,056	ERROR trial_runner.py:487 -- Error processing event.
Traceback (most recent call last):
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
    result = ray.get(trial_future[0])
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/worker.py", line 2198, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=10223, host=wsjeonMCBOOKPRO)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 87, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 323, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/trainable.py", line 87, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 424, in _setup
    self._allow_unknown_subkeys)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/util.py", line 93, in deep_update
    raise Exception("Unknown config parameter `{}` ".format(k))
Exception: Unknown config parameter `optimizer_class`

2019-06-13 00:16:14,060	INFO ray_trial_executor.py:187 -- Destroying actor for trial DDPG_Pendulum-v0_0. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs
Memory usage on this node: 11.3/17.2 GB
Result logdir: /Users/wsjeon/ray_results/pendulum-ddpg
Number of trials: 1 ({'ERROR': 1})
ERROR trials:
 - DDPG_Pendulum-v0_0:	ERROR, 1 failures: /Users/wsjeon/ray_results/pendulum-ddpg/DDPG_Pendulum-v0_0_2019-06-13_00-16-12xifin2zq/error_2019-06-13_00-16-14.txt

Traceback (most recent call last):
  File "/anaconda3/envs/marl-rllib/bin/rllib", line 10, in <module>
    sys.exit(cli())
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/rllib/scripts.py", line 38, in cli
    train.run(options, train_parser)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/rllib/train.py", line 147, in run
    resume=args.resume)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/tune.py", line 330, in run_experiments
    raise_on_failed_trial=raise_on_failed_trial)
  File "/anaconda3/envs/marl-rllib/lib/python3.6/site-packages/ray/tune/tune.py", line 272, in run
    raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [DDPG_Pendulum-v0_0])
(pid=10223) {'actor_hiddens': [64, 64], 'critic_hiddens': [64, 64], 'n_step': 1, 'model': {}, 'gamma': 0.99, 'env_config': {}, 'exploration_should_anneal': True, 'schedule_max_timesteps': 100000, 'timesteps_per_iteration': 600, 'exploration_fraction': 0.1, 'exploration_final_scale': 0.02, 'exploration_ou_noise_scale': 0.1, 'exploration_ou_theta': 0.15, 'exploration_ou_sigma': 0.2, 'target_network_update_freq': 0, 'tau': 0.001, 'buffer_size': 10000, 'prioritized_replay': True, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'clip_rewards': False, 'actor_lr': 0.001, 'critic_lr': 0.001, 'use_huber': True, 'huber_threshold': 1.0, 'l2_reg': 1e-06, 'learning_starts': 500, 'sample_batch_size': 1, 'train_batch_size': 64, 'num_workers': 0, 'num_gpus_per_worker': 0, 'optimizer_class': 'SyncReplayOptimizer', 'per_worker_exploration': False, 'worker_side_prioritization': False, 'evaluation_interval': 5, 'evaluation_num_episodes': 10, 'env': 'Pendulum-v0'}
(pid=10223) dict_items([('actor_hiddens', [64, 64]), ('critic_hiddens', [64, 64]), ('n_step', 1), ('model', {}), ('gamma', 0.99), ('env_config', {}), ('exploration_should_anneal', True), ('schedule_max_timesteps', 100000), ('timesteps_per_iteration', 600), ('exploration_fraction', 0.1), ('exploration_final_scale', 0.02), ('exploration_ou_noise_scale', 0.1), ('exploration_ou_theta', 0.15), ('exploration_ou_sigma', 0.2), ('target_network_update_freq', 0), ('tau', 0.001), ('buffer_size', 10000), ('prioritized_replay', True), ('prioritized_replay_alpha', 0.6), ('prioritized_replay_beta', 0.4), ('prioritized_replay_eps', 1e-06), ('clip_rewards', False), ('actor_lr', 0.001), ('critic_lr', 0.001), ('use_huber', True), ('huber_threshold', 1.0), ('l2_reg', 1e-06), ('learning_starts', 500), ('sample_batch_size', 1), ('train_batch_size', 64), ('num_workers', 0), ('num_gpus_per_worker', 0), ('optimizer_class', 'SyncReplayOptimizer'), ('per_worker_exploration', False), ('worker_side_prioritization', False), ('evaluation_interval', 5), ('evaluation_num_episodes', 10), ('env', 'Pendulum-v0')])
(pid=10223) {}
(pid=10223) dict_items([])
(pid=10223) {}
(pid=10223) dict_items([])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions