[rllib] Contribute DDPG to RLlib by joneswong · Pull Request #1877 · ray-project/ray

joneswong · 2018-04-11T16:42:33Z

What do these changes do?

Implemented DDPG (see ./rllib/ddpg) in a consistent style with the DQN:

DDPGAgent
DDPGEvaluator
DDPGGraph

Validated on Pendulum-v0 and MountainCarContinuous-v0 with LocalSyncReplayOptimizer and ApeXOptimizer:

Using LocalSyncReplayOptimizer on Pendulum-v0 (see ./rllib/tuned_examples/pendulum-ddpg.yaml) and mean100rewards reaches -160 in around 30k to 40k timesteps
Using ApeXOptimizer on Pendulum-v0 (see ./rllib/tuned_examples/pendulum-apex-ddpg.yaml) and mean100reward reaches -160 within around 15mins with 16 workers
Using LocalSyncReplayOptimizer on MountainCarContinuous-v0 (see ./rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml) and mean100rewards reaches 90 in around 20k to 30k timesteps
Using ApeXOptimizer on MountainCarContinuous-v0 (see ./rllib/tuned_examples/mountaincarcontinuous-apex-ddpg.yaml) and mean100reward reaches 90 within around 15mins with 16 workers

Some functionalities e.g., OU process for generating noise, Schedulers, etc. can be refactored as common utilities (i.e., put them in ./rllib/utils). However, we want to keep each pull-request clean and specific for one function.

Related issue number

#1868

…figuration

contribute DDPG and related test configurations to Ray RLlib

AmplabJenkins · 2018-04-11T17:41:23Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4809/
Test PASSed.

ericl · 2018-04-11T18:11:48Z

Thanks for adding this! We'll need to think of some strategy for how to merge these DDPG impls, one possibility is to have ddpg1, ddpg2, ddpg3 initially. Then, we can compare their performance and integrate them better into a single algorithm.

#1685
#1868

ericl · 2018-04-12T06:08:25Z

cc @vlad17

ericl · 2018-04-12T06:25:06Z

python/ray/rllib/__init__.py


 def _register_all():
-    for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake",
+    for key in ["DDPG", "APEX_DDPG", "PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake",


Let's call this DDPG2 and APEX_DDPG2 for now, since there is a conflicting PR that was merged earlier. We will resolve the differences later to combine the implementations.

The package directory should also be moved to rllib/ddpg2.

ericl · 2018-04-12T06:26:07Z

python/ray/rllib/ddpg/__init__.py

+from __future__ import print_function
+
+from ray.rllib.ddpg.apex import ApexDDPGAgent
+from ray.rllib.ddpg.ddpg import DDPGAgent, DEFAULT_CONFIG


ApexDDPG2Agent, DDPG2Agent

ericl · 2018-04-12T06:28:55Z

python/ray/rllib/ddpg/common/schedules.py

@@ -0,0 +1,108 @@
+"""This file is used for specifying various schedules that evolve over


This file is literally a copy of dqn/schedules.py. Let's move that to common to avoid this code duplication.

To avoid making any change to dqn, I just use the schedulers of dqn currently.

ericl · 2018-04-12T06:31:28Z

python/ray/rllib/ddpg/common/wrappers.py

+    # Override atari default to use the deepmind wrappers.
+    # TODO(ekl) this logic should be pushed to the catalog.
+    if is_atari and "custom_preprocessor" not in options:
+        return wrap_deepmind(env, random_starts=random_starts)


Since DDPG isn't typically used with discrete action spaces (e.g, atari), how about we remove this wrapper and just use ModelCatalog.get_preprocessor...?

This means we can remove this file.

I tried your solution. However, without an complicated assertion on observation type, say Box(0.0, 1.0, (80, 80, 1), dtype=np.float32) is allowed while Box(0.0, 1.0, (210, 160, 3), dtype=np.float32) is unsupported, DDPG can NOT pass the test/test_supported_spaces.py test. Thus, I removed this file according to your comment but directly import and use it from dqn instead of your proposal. Any concern, please let me know and I will revise according to your comments.

I see, that sounds fine then, we can clean up Atari handling later.

ericl · 2018-04-12T06:32:13Z

python/ray/rllib/ddpg/ddpg.py

+    smoothing_num_episodes=100,
+
+
+


Two newlines only.

ok, got your conventions.

ericl · 2018-04-12T06:36:08Z

python/ray/rllib/ddpg/tests/test_agent.py

+        print(str(result))
+        pretty_print(result)
+
+if __name__=="__main__":


Is it intended for this test to be run as part of the automated tests?

If so, consider adding it as an entry in run_multi_node_tests.sh, otherwise removing it.

sorry for making it messy, I removed these files.

ericl · 2018-04-12T06:36:47Z

python/ray/rllib/ddpg/tests/test_evaluator.py

+        print(mean_100ep_reward)
+        """
+
+if __name__=="__main__":


Is this intended to be an automated test?

ericl · 2018-04-12T06:37:10Z

python/ray/rllib/ddpg/tests/test_graph.py

+            ob = new_ob
+
+if __name__=="__main__":
+    main()


Is this intended to be an automated test?

ericl · 2018-04-12T06:38:43Z

python/ray/rllib/ddpg/ddpg_evaluator.py

+        self.saved_mean_reward = data[3]
+        self.obs = data[4]
+        self.global_timestep = data[5]
+        self.local_timestep = data[6]


I'm a little concerned this file has too much in common with dqn_evaluator.py. However, it's not clear if coupling DQN and DDPG would be a good idea either. @richardliaw any thoughts?

You are right. Both of them are in a Q-learning behavior. The differences can be expressed by DQNGraph and DDPGGraph. We can consider distilling a parent class like QAgent/QEvaluator later.

ericl · 2018-04-12T06:40:32Z

python/ray/rllib/tuned_examples/regression_tests/pendulum-ddpg.yaml

@@ -0,0 +1,15 @@
+pendulum-ddpg:
+    env: Pendulum-v0
+    run: DDPG


DDPG2 here and elsewhere in the YAML examples.

Btw, how long does this usually take to complete?

Done. It takes around 650 to 750 seconds.

ericl · 2018-04-12T06:42:17Z

Two other requests for tests:

Add an entry for DDPG2 to test_checkpoint_restore.py, to verify checkpointing correctness.
Add an entry for DDPG2 to test_supported_spaces.py, to verify action / observation space support.

AmplabJenkins · 2018-04-12T11:07:45Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4827/
Test PASSed.

ericl · 2018-04-12T18:06:50Z

python/ray/rllib/ddpg2/common/schedules.py

@@ -0,0 +1,108 @@
+"""This file is used for specifying various schedules that evolve over


This file can be removed.

ericl · 2018-04-12T18:08:09Z

python/ray/rllib/ddpg2/models.py

@@ -0,0 +1,408 @@
+from __future__ import absolute_import
+from __future__ import division


@vladf can you take a look over the models here?

ericl · 2018-04-12T18:09:41Z

python/ray/rllib/ddpg2/models.py

+            s_func_vars = _scope_vars(scope.name)
+
+        # Actor: P (policy) network
+        p_scope_name = TOWER_SCOPE_NAME + "/p_func"


We can drop TOWER SCOPE name, that was only used for the multi GPU optimizer, which is being refactored to not need it

joneswong · 2018-04-13T17:15:18Z

Hi teachers,
we don't have mujoco now. I will buy it later and conduct experiments on it.
I know how to run a Ray cluster but I have no idea how to prepare an entry (you mean a .yaml?) for you so that the Jenkins can automatically run corresponding scripts from different machines.
Sorry for making this pr trivial. Any requirement, please let me know. My group and I really want to make more contributions to Ray.

AmplabJenkins · 2018-04-13T17:48:35Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4872/
Test PASSed.

ericl · 2018-04-14T02:16:58Z

Looks like there is a conflicting file. I think we can merge this once that's fixed, but let's make sure to add results for HalfCheetah experiments afterwards.

AmplabJenkins · 2018-04-16T19:26:59Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4945/
Test PASSed.

ericl · 2018-04-16T22:53:32Z

@joneswong the lint tests are failing in travis. Can you run find . -name '*.py' -type f -exec yapf -i -r {} \; in the ddpg2 directory to fix the formatting?

richardliaw · 2018-04-18T03:19:24Z

python/ray/rllib/test/test_checkpoint_restore.py

 CONFIGS = {
    "ES": {"episodes_per_batch": 10, "timesteps_per_batch": 100},
    "DQN": {},
+    "DDPG2": {"noise_scale": 0.0},


We should add @alvkao58's DDPG to this file too (in a separate PR that is)

richardliaw · 2018-04-18T03:19:47Z

python/ray/rllib/test/test_supported_spaces.py

    def testAll(self):
        ray.init()
        stats = {}
+        check_support("DDPG2", {"timesteps_per_iteration": 1}, stats)


same comment about @alvkao58 (not in this pr)

AmplabJenkins · 2018-04-18T05:16:06Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4982/
Test PASSed.

joneswong · 2018-04-18T11:11:17Z

@ericl formatted the .py files in ddpg2 folder by yapf by executing the command you provided.

ericl · 2018-04-18T18:12:57Z

Hm, there seem to be some lint errors still: https://api.travis-ci.org/v3/job/367974621/log.txt (click the travis details -> go to the LINT job)

travis_time:start:0f2ea2b0
�[0K$ flake8 --exclude=python/ray/core/src/common/flatbuffers_ep-prefix/,python/ray/core/generated/,src/common/format/,doc/source/conf.py,python/ray/cloudpickle/
./python/ray/rllib/ddpg2/models.py:25:80: E501 line too long (83 > 79 characters)
./python/ray/rllib/ddpg2/models.py:40:80: E501 line too long (87 > 79 characters)
./python/ray/rllib/ddpg2/models.py:40:87: E502 the backslash is redundant between brackets
./python/ray/rllib/ddpg2/models.py:41:26: E128 continuation line under-indented for visual indent
./python/ray/rllib/ddpg2/models.py:44:59: E502 the backslash is redundant between brackets
./python/ray/rllib/ddpg2/models.py:45:26: E128 continuation line under-indented for visual indent
./python/ray/rllib/ddpg2/models.py:45:80: E501 line too long (81 > 79 characters)
./python/ray/rllib/ddpg2/models.py:220:13: F841 local variable 'q_values' is assigned to but never used
./python/ray/rllib/ddpg2/models.py:270:20: E713 test for membership should be 'not in'
./python/ray/rllib/ddpg2/models.py:273:20: E713 test for membership should be 'not in'
./python/ray/rllib/ddpg2/models.py:300:80: E501 line too long (84 > 79 characters)
./python/ray/rllib/ddpg2/ddpg_evaluator.py:44:44: E225 missing whitespace around operator

AmplabJenkins · 2018-04-19T06:37:53Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5001/
Test FAILed.

AmplabJenkins · 2018-04-19T13:30:32Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5002/
Test PASSed.

ericl · 2018-04-20T05:36:40Z

Merged, thanks!

robertnishihara · 2018-04-20T05:37:42Z

Nice work!

joneswong · 2018-04-20T05:44:52Z

Thanks for eric's patience. I learned some conventions of open source project from this pr. I will adhere to the code style in the following prs.

richardliaw · 2018-04-20T06:17:26Z

Thanks for contributing this!

…

On Thu, Apr 19, 2018 at 10:44 PM Jones Wong ***@***.***> wrote: Thanks for eric's patience. I learned some conventions of open source project from this pr. I will adhere to the code style in the following prs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1877 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEUc5etzdQ3yLH28yLtcOCjhihDBDHeSks5tqXXWgaJpZM4TQX9z> .

* master: Handle interrupts correctly for ASIO synchronous reads and writes. (ray-project#1929) [DataFrame] Adding read methods and tests (ray-project#1712) Allow task_table_update to fail when tasks are finished. (ray-project#1927) [rllib] Contribute DDPG to RLlib (ray-project#1877) [xray] Workers blocked in a `ray.get` release their resources (ray-project#1920) Raylet task dispatch and throttling worker startup (ray-project#1912) [DataFrame] Eval fix (ray-project#1903) [tune] Polishing docs (ray-project#1846) [tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (ray-project#1848) Preemptively push local arguments for actor tasks (ray-project#1901) [tune] Allow fetching pinned objects from trainable functions (ray-project#1895) Multithreading refactor for ObjectManager. (ray-project#1911) Add slice functionality (ray-project#1832) [DataFrame] Pass read_csv kwargs to _infer_column (ray-project#1894) Addresses missed comments from multichunk object transfer PR. (ray-project#1908) Allow numpy arrays to be passed by value into tasks (and inlined in the task spec). (ray-project#1816) [xray] Lineage cache requests notifications from the GCS about remote tasks (ray-project#1834) Fix UI issue for non-json-serializable task arguments. (ray-project#1892) Remove unnecessary calls to .hex() for object IDs. (ray-project#1910) Allow multiple raylets to be started on a single machine. (ray-project#1904) # Conflicts: # python/ray/rllib/__init__.py # python/ray/rllib/dqn/dqn.py

* master: updates (ray-project#1958) Pin Cython in autoscaler development example. (ray-project#1951) Incorporate C++ Buffer management and Seal global threadpool fix from arrow (ray-project#1950) [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (ray-project#1944) Remove smart_open install. (ray-project#1943) [DataFrame] Fully implement append, concat and join (ray-project#1932) [DataFrame] Fix for __getitem__ string indexing (ray-project#1939) [DataFrame] Implementing write methods (ray-project#1918) [rllib] arr[end] was excluded when end is not None (ray-project#1931) [DataFrame] Implementing API correct groupby with aggregation methods (ray-project#1914) Handle interrupts correctly for ASIO synchronous reads and writes. (ray-project#1929) [DataFrame] Adding read methods and tests (ray-project#1712) Allow task_table_update to fail when tasks are finished. (ray-project#1927) [rllib] Contribute DDPG to RLlib (ray-project#1877) [xray] Workers blocked in a `ray.get` release their resources (ray-project#1920) Raylet task dispatch and throttling worker startup (ray-project#1912) [DataFrame] Eval fix (ray-project#1903)

joneswong and others added 19 commits March 13, 2018 23:07

ongoing ddpg

4ebca67

ongoing ddpg converged

15c2f01

gpu machine changes

795c82c

tuned

841891d

sync with the latest ray-porject

8add6c3

tuned ddpg specification

9a29df5

merge the latest ray

2981a02

ddpg

95a3e86

supplement missed optimizer argument clip_rewards in default DQN con…

39a4537

…figuration

ddpg supports vision env (atari) now

125b0e3

Merge branch 'master' into dev_jones

24dfa67

Merge branch 'master' into dev_jones_contrib

040ebc7

revised according to code review comments

559b6ee

merge this contribution

4b97522

added regression test case

16bfaef

removed irrelevant files

bcebaac

validate ddpg on mountain_car_continuous

2bceb38

Merge branch 'dev_jones'

53e59b9

contribute DDPG and related test configurations to Ray RLlib

restore unnecessary slight changes

763f495

ericl changed the title ~~Contribute DDPG to RLlib~~ [rllib] Contribute DDPG to RLlib Apr 11, 2018

ericl self-assigned this Apr 11, 2018

ericl requested changes Apr 12, 2018

View reviewed changes

joneswong added 2 commits April 12, 2018 16:33

revised according to eric's comments

3550143

added the requested tests

f595276

ericl reviewed Apr 12, 2018

View reviewed changes

ericl approved these changes Apr 14, 2018

View reviewed changes

Merge branch 'master' into master

8336763

richardliaw reviewed Apr 18, 2018

View reviewed changes

joneswong added 2 commits April 18, 2018 12:08

formatted by yapf

abe1f69

Merge branch 'master' of https://github.com/AlibabaPAI/ray

137dd98

joneswong added 5 commits April 19, 2018 13:30

fix lint errors

950e756

formatted by yapf

da37284

fix lint errors

5055453

formatted by yapf

859adf4

fix lint errors

3c11046

joneswong added 2 commits April 19, 2018 20:23

fix lint error

fc8932b

Merge branch 'dev_jones'

0aaba16

ericl merged commit c9a7744 into ray-project:master Apr 20, 2018

		@@ -0,0 +1,108 @@
		"""This file is used for specifying various schedules that evolve over

		@@ -0,0 +1,408 @@
		from __future__ import absolute_import
		from __future__ import division

Conversation

joneswong commented Apr 11, 2018

What do these changes do?

Related issue number

Uh oh!

AmplabJenkins commented Apr 11, 2018

Uh oh!

ericl commented Apr 11, 2018

Uh oh!

ericl commented Apr 12, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl commented Apr 12, 2018

Uh oh!

AmplabJenkins commented Apr 12, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joneswong commented Apr 13, 2018

Uh oh!

AmplabJenkins commented Apr 13, 2018

Uh oh!

ericl commented Apr 14, 2018

Uh oh!

AmplabJenkins commented Apr 16, 2018

Uh oh!

ericl commented Apr 16, 2018

Uh oh!

richardliaw Apr 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardliaw Apr 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

richardliaw Apr 18, 2018 •

edited

Loading

richardliaw Apr 18, 2018 •

edited

Loading

ericl commented Apr 18, 2018 •

edited

Loading