[rllib] Contribute DDPG to RLlib#1877
Conversation
contribute DDPG and related test configurations to Ray RLlib
|
Test PASSed. |
|
cc @vlad17 |
python/ray/rllib/__init__.py
Outdated
|
|
||
| def _register_all(): | ||
| for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake", | ||
| for key in ["DDPG", "APEX_DDPG", "PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake", |
There was a problem hiding this comment.
Let's call this DDPG2 and APEX_DDPG2 for now, since there is a conflicting PR that was merged earlier. We will resolve the differences later to combine the implementations.
The package directory should also be moved to rllib/ddpg2.
| from __future__ import print_function | ||
|
|
||
| from ray.rllib.ddpg.apex import ApexDDPGAgent | ||
| from ray.rllib.ddpg.ddpg import DDPGAgent, DEFAULT_CONFIG |
| @@ -0,0 +1,108 @@ | |||
| """This file is used for specifying various schedules that evolve over | |||
There was a problem hiding this comment.
This file is literally a copy of dqn/schedules.py. Let's move that to common to avoid this code duplication.
There was a problem hiding this comment.
To avoid making any change to dqn, I just use the schedulers of dqn currently.
| # Override atari default to use the deepmind wrappers. | ||
| # TODO(ekl) this logic should be pushed to the catalog. | ||
| if is_atari and "custom_preprocessor" not in options: | ||
| return wrap_deepmind(env, random_starts=random_starts) |
There was a problem hiding this comment.
Since DDPG isn't typically used with discrete action spaces (e.g, atari), how about we remove this wrapper and just use ModelCatalog.get_preprocessor...?
This means we can remove this file.
There was a problem hiding this comment.
I tried your solution. However, without an complicated assertion on observation type, say Box(0.0, 1.0, (80, 80, 1), dtype=np.float32) is allowed while Box(0.0, 1.0, (210, 160, 3), dtype=np.float32) is unsupported, DDPG can NOT pass the test/test_supported_spaces.py test. Thus, I removed this file according to your comment but directly import and use it from dqn instead of your proposal. Any concern, please let me know and I will revise according to your comments.
There was a problem hiding this comment.
I see, that sounds fine then, we can clean up Atari handling later.
python/ray/rllib/ddpg/ddpg.py
Outdated
| smoothing_num_episodes=100, | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
ok, got your conventions.
| print(str(result)) | ||
| pretty_print(result) | ||
|
|
||
| if __name__=="__main__": |
There was a problem hiding this comment.
Is it intended for this test to be run as part of the automated tests?
If so, consider adding it as an entry in run_multi_node_tests.sh, otherwise removing it.
There was a problem hiding this comment.
sorry for making it messy, I removed these files.
| print(mean_100ep_reward) | ||
| """ | ||
|
|
||
| if __name__=="__main__": |
There was a problem hiding this comment.
Is this intended to be an automated test?
| ob = new_ob | ||
|
|
||
| if __name__=="__main__": | ||
| main() |
There was a problem hiding this comment.
Is this intended to be an automated test?
| self.saved_mean_reward = data[3] | ||
| self.obs = data[4] | ||
| self.global_timestep = data[5] | ||
| self.local_timestep = data[6] |
There was a problem hiding this comment.
I'm a little concerned this file has too much in common with dqn_evaluator.py. However, it's not clear if coupling DQN and DDPG would be a good idea either. @richardliaw any thoughts?
There was a problem hiding this comment.
You are right. Both of them are in a Q-learning behavior. The differences can be expressed by DQNGraph and DDPGGraph. We can consider distilling a parent class like QAgent/QEvaluator later.
| @@ -0,0 +1,15 @@ | |||
| pendulum-ddpg: | |||
| env: Pendulum-v0 | |||
| run: DDPG | |||
There was a problem hiding this comment.
DDPG2 here and elsewhere in the YAML examples.
Btw, how long does this usually take to complete?
There was a problem hiding this comment.
Done. It takes around 650 to 750 seconds.
|
Two other requests for tests:
|
|
Test PASSed. |
| @@ -0,0 +1,108 @@ | |||
| """This file is used for specifying various schedules that evolve over | |||
| @@ -0,0 +1,408 @@ | |||
| from __future__ import absolute_import | |||
| from __future__ import division | |||
There was a problem hiding this comment.
@vladf can you take a look over the models here?
python/ray/rllib/ddpg2/models.py
Outdated
| s_func_vars = _scope_vars(scope.name) | ||
|
|
||
| # Actor: P (policy) network | ||
| p_scope_name = TOWER_SCOPE_NAME + "/p_func" |
There was a problem hiding this comment.
We can drop TOWER SCOPE name, that was only used for the multi GPU optimizer, which is being refactored to not need it
|
Hi teachers, |
|
Test PASSed. |
|
Looks like there is a conflicting file. I think we can merge this once that's fixed, but let's make sure to add results for HalfCheetah experiments afterwards. |
|
Test PASSed. |
|
@joneswong the lint tests are failing in travis. Can you run |
| CONFIGS = { | ||
| "ES": {"episodes_per_batch": 10, "timesteps_per_batch": 100}, | ||
| "DQN": {}, | ||
| "DDPG2": {"noise_scale": 0.0}, |
There was a problem hiding this comment.
We should add @alvkao58's DDPG to this file too (in a separate PR that is)
| def testAll(self): | ||
| ray.init() | ||
| stats = {} | ||
| check_support("DDPG2", {"timesteps_per_iteration": 1}, stats) |
|
Test PASSed. |
|
@ericl formatted the .py files in ddpg2 folder by yapf by executing the command you provided. |
|
Hm, there seem to be some lint errors still: https://api.travis-ci.org/v3/job/367974621/log.txt (click the travis details -> go to the LINT job) |
|
Test FAILed. |
|
Test PASSed. |
|
Merged, thanks! |
|
Nice work! |
|
Thanks for eric's patience. I learned some conventions of open source project from this pr. I will adhere to the code style in the following prs. |
|
Thanks for contributing this!
…On Thu, Apr 19, 2018 at 10:44 PM Jones Wong ***@***.***> wrote:
Thanks for eric's patience. I learned some conventions of open source
project from this pr. I will adhere to the code style in the following prs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1877 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEUc5etzdQ3yLH28yLtcOCjhihDBDHeSks5tqXXWgaJpZM4TQX9z>
.
|
* master: Handle interrupts correctly for ASIO synchronous reads and writes. (ray-project#1929) [DataFrame] Adding read methods and tests (ray-project#1712) Allow task_table_update to fail when tasks are finished. (ray-project#1927) [rllib] Contribute DDPG to RLlib (ray-project#1877) [xray] Workers blocked in a `ray.get` release their resources (ray-project#1920) Raylet task dispatch and throttling worker startup (ray-project#1912) [DataFrame] Eval fix (ray-project#1903) [tune] Polishing docs (ray-project#1846) [tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (ray-project#1848) Preemptively push local arguments for actor tasks (ray-project#1901) [tune] Allow fetching pinned objects from trainable functions (ray-project#1895) Multithreading refactor for ObjectManager. (ray-project#1911) Add slice functionality (ray-project#1832) [DataFrame] Pass read_csv kwargs to _infer_column (ray-project#1894) Addresses missed comments from multichunk object transfer PR. (ray-project#1908) Allow numpy arrays to be passed by value into tasks (and inlined in the task spec). (ray-project#1816) [xray] Lineage cache requests notifications from the GCS about remote tasks (ray-project#1834) Fix UI issue for non-json-serializable task arguments. (ray-project#1892) Remove unnecessary calls to .hex() for object IDs. (ray-project#1910) Allow multiple raylets to be started on a single machine. (ray-project#1904) # Conflicts: # python/ray/rllib/__init__.py # python/ray/rllib/dqn/dqn.py
* master: updates (ray-project#1958) Pin Cython in autoscaler development example. (ray-project#1951) Incorporate C++ Buffer management and Seal global threadpool fix from arrow (ray-project#1950) [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (ray-project#1944) Remove smart_open install. (ray-project#1943) [DataFrame] Fully implement append, concat and join (ray-project#1932) [DataFrame] Fix for __getitem__ string indexing (ray-project#1939) [DataFrame] Implementing write methods (ray-project#1918) [rllib] arr[end] was excluded when end is not None (ray-project#1931) [DataFrame] Implementing API correct groupby with aggregation methods (ray-project#1914) Handle interrupts correctly for ASIO synchronous reads and writes. (ray-project#1929) [DataFrame] Adding read methods and tests (ray-project#1712) Allow task_table_update to fail when tasks are finished. (ray-project#1927) [rllib] Contribute DDPG to RLlib (ray-project#1877) [xray] Workers blocked in a `ray.get` release their resources (ray-project#1920) Raylet task dispatch and throttling worker startup (ray-project#1912) [DataFrame] Eval fix (ray-project#1903)
What do these changes do?
Implemented DDPG (see ./rllib/ddpg) in a consistent style with the DQN:
Validated on Pendulum-v0 and MountainCarContinuous-v0 with LocalSyncReplayOptimizer and ApeXOptimizer:
Using LocalSyncReplayOptimizer on Pendulum-v0 (see ./rllib/tuned_examples/pendulum-ddpg.yaml) and mean100rewards reaches -160 in around 30k to 40k timesteps

Using ApeXOptimizer on Pendulum-v0 (see ./rllib/tuned_examples/pendulum-apex-ddpg.yaml) and mean100reward reaches -160 within around 15mins with 16 workers

Using LocalSyncReplayOptimizer on MountainCarContinuous-v0 (see ./rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml) and mean100rewards reaches 90 in around 20k to 30k timesteps

Using ApeXOptimizer on MountainCarContinuous-v0 (see ./rllib/tuned_examples/mountaincarcontinuous-apex-ddpg.yaml) and mean100reward reaches 90 within around 15mins with 16 workers

Some functionalities e.g., OU process for generating noise, Schedulers, etc. can be refactored as common utilities (i.e., put them in ./rllib/utils). However, we want to keep each pull-request clean and specific for one function.
Related issue number
#1868