[rllib] Adding DDPG (adapted from baselines) by qyccc · Pull Request #1868 · ray-project/ray

qyccc · 2018-04-10T10:20:18Z

What do these changes do?

Add the algorithm ddpg in rllib.
It can run using the same command as 'python ray/python/ray/rllib/train.py --run DDPG --env MountainCarContinuous-v0'.

Related issue number

AmplabJenkins · 2018-04-10T10:52:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4779/
Test FAILed.

richardliaw · 2018-04-10T16:50:20Z

Thanks for contributing this! Do you have any performance numbers/charts?

richardliaw

Thanks a bunch for contributing this! Do you mind running flake8 on rllib/ddpg/ and fixing the errors that show up?

richardliaw · 2018-04-10T16:49:18Z

python/ray/rllib/__init__.py

 def _register_all():
    for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake",
-                "__sigmoid_fake_data", "__parameter_tuning"]:
+                "__sigmoid_fake_data", "__parameter_tuning", "DDPG"]:


do you mind moving this before the private keys?

richardliaw · 2018-04-10T16:51:47Z

python/ray/rllib/ddpg/models.py

+
+    def _build_q_network(self, registry, inputs, state_space, ac_space, act_t, config):
+        x = inputs
+        x = tf.layers.dense(x, 64)


can you use tf.slim instead? it would be great to avoid mixing higher level libraries

richardliaw · 2018-04-10T16:53:12Z

python/ray/rllib/ddpg/ddpg.py

+import os
+
+import numpy as np
+import tensorflow as tf


Would it be possible to move all of the TF code out of this file? We would like to keep the main algorithm framework agnostic

ericl

Thanks for contributing this! I think this is good to merge after we add it to the test files mentioned below, and check that it passes those.

ericl · 2018-04-11T01:39:09Z

python/ray/rllib/ddpg/ou_noise.py

@@ -0,0 +1,56 @@
+# --------------------------------------


Shall we move this into rllib/utils ?

ericl · 2018-04-11T01:39:27Z

python/ray/rllib/ddpg/models.py

+from ray.rllib.ddpg.ou_noise import AdaptiveParamNoiseSpec
+
+
+def _huber_loss(x, delta=1.0):


Some of this seems duplicated with DQN, we should probably move it to a util file.

ericl · 2018-04-11T01:39:57Z

python/ray/rllib/ddpg/ddpg_evaluator.py

+
+    """The base DDPG Evaluator that does not include the replay buffer.
+
+    TODO(rliaw): Support observation/reward filters?"""


This comment seems out of date and could be removed.

ericl · 2018-04-11T01:40:10Z

python/ray/rllib/ddpg/ddpg.py

+
+        return result
+
+    def _populate_replay_buffer(self):


Unused function.

ericl · 2018-04-11T01:41:15Z

python/ray/rllib/agent.py

        return _ParameterTuningAgent
+    elif alg == "DDPG":
+        from ray.rllib import ddpg
+        return ddpg.DDPGAgent


Along with registering it here, could you also register DDPG in these automated tests?

The generic test for supported obs/action spaces:
https://github.com/ray-project/ray/blob/master/python/ray/rllib/test/test_supported_spaces.py#L132

The regression tests folder: https://github.com/ray-project/ray/tree/master/python/ray/rllib/tuned_examples/regression_tests

(probably MountainCarContinuous-v0 is the right env for this test)

The checkpoint/restore test https://github.com/ray-project/ray/blob/master/python/ray/rllib/test/test_checkpoint_restore.py

You should be able to basically copy paste an existing entry for e.g., DQN in each of these cases. The tests can be run locally (and will also be run by travis/jenkins once pushed here).

ericl · 2018-04-11T01:46:48Z

python/ray/rllib/ddpg/ddpg.py

+    # Whether to compute priorities on workers.
+    worker_side_prioritization=False,
+    # Whether to force evaluator actors to be placed on remote machines.
+    force_evaluators_remote=False)


Could we drop this config? It used to be used for Ape-X but is no longer needed due to improvements in Ray's actor scheduling.

qyccc · 2018-04-11T02:51:34Z

MountaincarContinous DDPG

move all of the TF code out of ddpg.py remove all unused functions use tf.slim

AmplabJenkins · 2018-04-11T16:52:42Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4806/
Test PASSed.

AmplabJenkins · 2018-04-11T16:59:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4807/
Test FAILed.

ericl

Looks close. I just ran ./train.py -f tuned_examples/regression_tests/mountaincarcontinuous-ddpg.yaml and got

Remote function __init__ failed with:

Traceback (most recent call last):
  File "/home/eric/Desktop/ray-private/python/ray/worker.py", line 832, in _process_task
    *arguments)
  File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 212, in actor_method_executor
    method_returns = method(actor, *args)
  File "/home/eric/Desktop/ray-private/python/ray/rllib/agent.py", line 84, in __init__
    Trainable.__init__(self, config, registry, logger_creator)
  File "/home/eric/Desktop/ray-private/python/ray/tune/trainable.py", line 89, in __init__
    self._setup()
  File "/home/eric/Desktop/ray-private/python/ray/rllib/agent.py", line 107, in _setup
    self._init()
  File "/home/eric/Desktop/ray-private/python/ray/rllib/ddpg/ddpg.py", line 105, in _init
    for i in range(self.config["num_workers"])]
  File "/home/eric/Desktop/ray-private/python/ray/rllib/ddpg/ddpg.py", line 105, in <listcomp>
    for i in range(self.config["num_workers"])]
  File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 822, in remote
    dependency=actor_cursor)
  File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 646, in _actor_method_call
    args = signature.extend_args(function_signature, args, kwargs)
  File "/home/eric/Desktop/ray-private/python/ray/signature.py", line 212, in extend_args
    .format(function_name))
Exception: Too many arguments were passed to the function '__init__'

I think this is since the evaluator class doesn't take a logdir argument? Could you make sure that that example runs to -15 result?

ericl · 2018-04-11T19:29:19Z

python/ray/rllib/ddpg/ddpg.py

+    # Number of env steps to optimize for before returning
+    timesteps_per_iteration=1000,
+
+    exploration_noise=0.2,


Comment for exploration_noise and action_noise?

ericl · 2018-04-11T19:29:40Z

python/ray/rllib/ddpg/ddpg.py

+    # Smooth the current average reward over this many previous episodes.
+    smoothing_num_episodes=100,
+
+


one newline

ericl · 2018-04-11T19:34:23Z

python/ray/rllib/ddpg/ddpg.py

+
+        return result
+
+    # def _populate_replay_buffer(self):


ericl · 2018-04-11T19:36:11Z

python/ray/rllib/ddpg/ou_noise.py

+import numpy as np
+
+
+class OUNoise:


I think you forgot to remove this file after moving it to rllib/utils

ericl · 2018-04-11T19:48:54Z

python/ray/rllib/__init__.py


 def _register_all():
-    for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake",
+    for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "DDPG", "__fake",


Could we rename this to "DDPG_baselines", and change the directory to rllib/ddpg_baselines in order to not conflict with the other DDPG pr here https://github.com/ray-project/ray/pull/1877/files ?

The main feature of this PR is that it's adapted from the baselines code, so we should probably keep that in the naming (or, could also call it DDPG2 and just leave the details in the README).

AmplabJenkins · 2018-04-12T04:13:08Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4817/
Test FAILed.

AmplabJenkins · 2018-04-12T04:17:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4818/
Test FAILed.

AmplabJenkins · 2018-04-12T04:36:16Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4819/
Test FAILed.

ericl

Bunch of lint errors here: https://api.travis-ci.org/v3/job/365429239/log.txt

Could you also make sure the HalfCheetah environment trains, and add a tuned example for this env? This one should be fairly fast and so is a good sanity check for DDPG.

ericl · 2018-04-12T22:02:52Z

python/ray/rllib/__init__.py

+
    for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "DDPG",
-                "__fake", "__sigmoid_fake_data", "__parameter_tuning"]:
+                "DDPG_beselines", "__fake", "__sigmoid_fake_data", "__parameter_tuning"]:


typo: baselines

ericl · 2018-04-12T22:03:50Z

python/ray/rllib/ddpg_baselines/ddpg_evaluator.py

+        self.set_weights(objs["weights"])
+
+    def sync_filters(self, new_filters):
+            """Changes self's filter to given and rebases any accumulated delta.


Indentation here is off

AmplabJenkins · 2018-05-09T08:31:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5286/
Test FAILed.

AmplabJenkins · 2018-05-09T09:43:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5289/
Test FAILed.

AmplabJenkins · 2018-05-09T10:41:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5292/
Test FAILed.

AmplabJenkins · 2018-05-09T15:59:15Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5293/
Test FAILed.

AmplabJenkins · 2018-05-10T05:21:53Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5307/
Test PASSed.

AmplabJenkins · 2018-05-11T04:04:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5329/
Test PASSed.

AmplabJenkins · 2018-05-12T02:48:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5347/
Test FAILed.

AmplabJenkins · 2018-05-13T02:33:00Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5356/
Test FAILed.

AmplabJenkins · 2018-05-13T02:56:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5357/
Test FAILed.

AmplabJenkins · 2018-05-14T01:38:55Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5366/
Test PASSed.

AmplabJenkins · 2018-05-14T13:02:09Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5376/
Test FAILed.

AmplabJenkins · 2018-05-15T01:31:55Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5395/
Test FAILed.

AmplabJenkins · 2018-05-16T03:11:11Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5411/
Test PASSed.

qyccc · 2018-05-17T01:34:27Z

why does continuous-integration/travis-ci/pr always fail ? I don‘t konw if my code is wrong

robertnishihara · 2018-05-17T03:44:46Z

@qyccc unfortunately we have a bunch of flaky tests right now. Really need to address this.. Keeping track of them at https://github.com/ray-project/ray/issues?q=is%3Aissue+is%3Aopen+label%3A%22test+failure%22.

ericl · 2018-07-19T12:31:31Z

Hm now that the other two DDPG's have been consolidated (as they achieve the exact same performance), it probably makes sense for this one to as well.

add ddpg in rllib

c6615b3

richardliaw changed the title ~~add ddpg in rllib~~ [rllib] Adding DDPG Apr 10, 2018

richardliaw requested changes Apr 10, 2018

View reviewed changes

ericl reviewed Apr 11, 2018

View reviewed changes

qyccc added 4 commits April 11, 2018 15:16

Merge branch 'master' of https://github.com/ray-project/ray

87b25de

Merge branch 'master' of https://github.com/ray-project/ray

11e61e8

register DDPG in regression tests and the generic test

406f58e

move all of the TF code out of ddpg.py remove all unused functions use tf.slim

move noise to rllib.utils

c6810d8

joneswong mentioned this pull request Apr 11, 2018

[rllib] Contribute DDPG to RLlib #1877

Merged

ericl reviewed Apr 11, 2018

View reviewed changes

ericl changed the title ~~[rllib] Adding DDPG~~ [rllib] Adding DDPG (adapted from baselines) Apr 11, 2018

ericl self-assigned this Apr 11, 2018

qyccc added 5 commits April 12, 2018 11:36

rename ddpg_baselines

e4f82dd

Merge branch 'master' of https://github.com/ray-project/ray

9a2560b

change the regression test

f884b3d

change the generic tset

a446547

fix some error

f720bc9

ericl reviewed Apr 12, 2018

View reviewed changes

qyccc added 4 commits April 24, 2018 10:48

Merge branch 'master' of https://github.com/ray-project/ray

fbb635b

Merge branch 'master' of https://github.com/ray-project/ray

b308d61

Merge branch 'master' of https://github.com/ray-project/ray

78e6dec

change ddpg_baselines

e6bd6d4

change env config

8cfbeef

fix error

22660bf

add keyword

8a68a47

r

33ad53b

change observation space

b662969

Merge branch 'master' of https://github.com/ray-project/ray

fbf2557

Merge branch 'master' of https://github.com/ray-project/ray

88f9bf6

qyccc added 2 commits May 13, 2018 10:07

Merge branch 'master' of https://github.com/ray-project/ray

c98d343

fix lint error

988249b

Merge branch 'master' of https://github.com/ray-project/ray

858c320

Merge branch 'master' of https://github.com/ray-project/ray

89e28b0

Merge branch 'master' of https://github.com/ray-project/ray

06e6ed5

Merge branch 'master' of https://github.com/ray-project/ray

e590e87

ericl closed this Jul 19, 2018

		from ray.rllib.ddpg.ou_noise import AdaptiveParamNoiseSpec


		def _huber_loss(x, delta=1.0):


		"""The base DDPG Evaluator that does not include the replay buffer.

		TODO(rliaw): Support observation/reward filters?"""

		# Smooth the current average reward over this many previous episodes.
		smoothing_num_episodes=100,

Conversation

qyccc commented Apr 10, 2018

What do these changes do?

Related issue number

Uh oh!

AmplabJenkins commented Apr 10, 2018

Uh oh!

richardliaw commented Apr 10, 2018

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl Apr 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl Apr 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qyccc commented Apr 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmplabJenkins commented Apr 11, 2018

Uh oh!

AmplabJenkins commented Apr 11, 2018

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Apr 12, 2018

Uh oh!

AmplabJenkins commented Apr 12, 2018

Uh oh!

AmplabJenkins commented Apr 12, 2018

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented May 9, 2018

Uh oh!

AmplabJenkins commented May 9, 2018

Uh oh!

AmplabJenkins commented May 9, 2018

Uh oh!

AmplabJenkins commented May 9, 2018

Uh oh!

AmplabJenkins commented May 10, 2018

ericl left a comment •

edited

Loading

ericl Apr 11, 2018 •

edited

Loading

ericl Apr 11, 2018 •

edited

Loading

qyccc commented Apr 11, 2018 •

edited

Loading