[rllib] Adding DDPG (adapted from baselines)#1868
[rllib] Adding DDPG (adapted from baselines)#1868qyccc wants to merge 38 commits intoray-project:masterfrom
Conversation
|
Test FAILed. |
|
Thanks for contributing this! Do you have any performance numbers/charts? |
richardliaw
left a comment
There was a problem hiding this comment.
Thanks a bunch for contributing this! Do you mind running flake8 on rllib/ddpg/ and fixing the errors that show up?
python/ray/rllib/__init__.py
Outdated
| def _register_all(): | ||
| for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake", | ||
| "__sigmoid_fake_data", "__parameter_tuning"]: | ||
| "__sigmoid_fake_data", "__parameter_tuning", "DDPG"]: |
There was a problem hiding this comment.
do you mind moving this before the private keys?
python/ray/rllib/ddpg/models.py
Outdated
|
|
||
| def _build_q_network(self, registry, inputs, state_space, ac_space, act_t, config): | ||
| x = inputs | ||
| x = tf.layers.dense(x, 64) |
There was a problem hiding this comment.
can you use tf.slim instead? it would be great to avoid mixing higher level libraries
| import os | ||
|
|
||
| import numpy as np | ||
| import tensorflow as tf |
There was a problem hiding this comment.
Would it be possible to move all of the TF code out of this file? We would like to keep the main algorithm framework agnostic
| @@ -0,0 +1,56 @@ | |||
| # -------------------------------------- | |||
There was a problem hiding this comment.
Shall we move this into rllib/utils ?
| from ray.rllib.ddpg.ou_noise import AdaptiveParamNoiseSpec | ||
|
|
||
|
|
||
| def _huber_loss(x, delta=1.0): |
There was a problem hiding this comment.
Some of this seems duplicated with DQN, we should probably move it to a util file.
|
|
||
| """The base DDPG Evaluator that does not include the replay buffer. | ||
|
|
||
| TODO(rliaw): Support observation/reward filters?""" |
There was a problem hiding this comment.
This comment seems out of date and could be removed.
python/ray/rllib/ddpg/ddpg.py
Outdated
|
|
||
| return result | ||
|
|
||
| def _populate_replay_buffer(self): |
python/ray/rllib/agent.py
Outdated
| return _ParameterTuningAgent | ||
| elif alg == "DDPG": | ||
| from ray.rllib import ddpg | ||
| return ddpg.DDPGAgent |
There was a problem hiding this comment.
Along with registering it here, could you also register DDPG in these automated tests?
-
The generic test for supported obs/action spaces:
https://github.com/ray-project/ray/blob/master/python/ray/rllib/test/test_supported_spaces.py#L132 -
The regression tests folder: https://github.com/ray-project/ray/tree/master/python/ray/rllib/tuned_examples/regression_tests
(probably MountainCarContinuous-v0 is the right env for this test)
- The checkpoint/restore test https://github.com/ray-project/ray/blob/master/python/ray/rllib/test/test_checkpoint_restore.py
You should be able to basically copy paste an existing entry for e.g., DQN in each of these cases. The tests can be run locally (and will also be run by travis/jenkins once pushed here).
python/ray/rllib/ddpg/ddpg.py
Outdated
| # Whether to compute priorities on workers. | ||
| worker_side_prioritization=False, | ||
| # Whether to force evaluator actors to be placed on remote machines. | ||
| force_evaluators_remote=False) |
There was a problem hiding this comment.
Could we drop this config? It used to be used for Ape-X but is no longer needed due to improvements in Ray's actor scheduling.
move all of the TF code out of ddpg.py remove all unused functions use tf.slim
|
Test PASSed. |
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Looks close. I just ran ./train.py -f tuned_examples/regression_tests/mountaincarcontinuous-ddpg.yaml and got
Remote function __init__ failed with:
Traceback (most recent call last):
File "/home/eric/Desktop/ray-private/python/ray/worker.py", line 832, in _process_task
*arguments)
File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 212, in actor_method_executor
method_returns = method(actor, *args)
File "/home/eric/Desktop/ray-private/python/ray/rllib/agent.py", line 84, in __init__
Trainable.__init__(self, config, registry, logger_creator)
File "/home/eric/Desktop/ray-private/python/ray/tune/trainable.py", line 89, in __init__
self._setup()
File "/home/eric/Desktop/ray-private/python/ray/rllib/agent.py", line 107, in _setup
self._init()
File "/home/eric/Desktop/ray-private/python/ray/rllib/ddpg/ddpg.py", line 105, in _init
for i in range(self.config["num_workers"])]
File "/home/eric/Desktop/ray-private/python/ray/rllib/ddpg/ddpg.py", line 105, in <listcomp>
for i in range(self.config["num_workers"])]
File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 822, in remote
dependency=actor_cursor)
File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 646, in _actor_method_call
args = signature.extend_args(function_signature, args, kwargs)
File "/home/eric/Desktop/ray-private/python/ray/signature.py", line 212, in extend_args
.format(function_name))
Exception: Too many arguments were passed to the function '__init__'
I think this is since the evaluator class doesn't take a logdir argument? Could you make sure that that example runs to -15 result?
python/ray/rllib/ddpg/ddpg.py
Outdated
| # Number of env steps to optimize for before returning | ||
| timesteps_per_iteration=1000, | ||
|
|
||
| exploration_noise=0.2, |
There was a problem hiding this comment.
Comment for exploration_noise and action_noise?
python/ray/rllib/ddpg/ddpg.py
Outdated
| # Smooth the current average reward over this many previous episodes. | ||
| smoothing_num_episodes=100, | ||
|
|
||
|
|
python/ray/rllib/ddpg/ddpg.py
Outdated
|
|
||
| return result | ||
|
|
||
| # def _populate_replay_buffer(self): |
python/ray/rllib/ddpg/ou_noise.py
Outdated
| import numpy as np | ||
|
|
||
|
|
||
| class OUNoise: |
There was a problem hiding this comment.
I think you forgot to remove this file after moving it to rllib/utils
python/ray/rllib/__init__.py
Outdated
|
|
||
| def _register_all(): | ||
| for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake", | ||
| for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "DDPG", "__fake", |
There was a problem hiding this comment.
Could we rename this to "DDPG_baselines", and change the directory to rllib/ddpg_baselines in order to not conflict with the other DDPG pr here https://github.com/ray-project/ray/pull/1877/files ?
The main feature of this PR is that it's adapted from the baselines code, so we should probably keep that in the naming (or, could also call it DDPG2 and just leave the details in the README).
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Bunch of lint errors here: https://api.travis-ci.org/v3/job/365429239/log.txt
Could you also make sure the HalfCheetah environment trains, and add a tuned example for this env? This one should be fairly fast and so is a good sanity check for DDPG.
python/ray/rllib/__init__.py
Outdated
|
|
||
| for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "DDPG", | ||
| "__fake", "__sigmoid_fake_data", "__parameter_tuning"]: | ||
| "DDPG_beselines", "__fake", "__sigmoid_fake_data", "__parameter_tuning"]: |
| self.set_weights(objs["weights"]) | ||
|
|
||
| def sync_filters(self, new_filters): | ||
| """Changes self's filter to given and rebases any accumulated delta. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
why does continuous-integration/travis-ci/pr always fail ? I don‘t konw if my code is wrong |
|
@qyccc unfortunately we have a bunch of flaky tests right now. Really need to address this.. Keeping track of them at https://github.com/ray-project/ray/issues?q=is%3Aissue+is%3Aopen+label%3A%22test+failure%22. |
|
Hm now that the other two DDPG's have been consolidated (as they achieve the exact same performance), it probably makes sense for this one to as well. |

What do these changes do?
Add the algorithm ddpg in rllib.
It can run using the same command as 'python ray/python/ray/rllib/train.py --run DDPG --env MountainCarContinuous-v0'.
Related issue number