[RLlib] DDPG refactor and Exploration API action noise classes. by sven1977 · Pull Request #7314 · ray-project/ray

sven1977 · 2020-02-25T12:31:37Z

The Exploration API is missing to complete P0 tasks:
GaussianNoise and UrnsteinUhlenbeckNoise Exploration classes. These two are added with this PR (plus test cases).
DDPG (and TD3) are refactored to use these two classes.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.

…oration_api_action_noises

AmplabJenkins · 2020-02-25T12:34:43Z

Can one of the admins verify this patch?

AmplabJenkins · 2020-02-25T13:21:25Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22383/
Test FAILed.

AmplabJenkins · 2020-02-25T13:52:38Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22386/
Test FAILed.

AmplabJenkins · 2020-02-25T14:03:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22387/
Test FAILed.

AmplabJenkins · 2020-02-25T14:33:11Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22388/
Test FAILed.

AmplabJenkins · 2020-02-25T15:24:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22389/
Test FAILed.

AmplabJenkins · 2020-02-25T16:57:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22390/
Test FAILed.

AmplabJenkins · 2020-02-25T20:17:59Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22395/
Test FAILed.

AmplabJenkins · 2020-02-26T08:40:54Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22426/
Test FAILed.

AmplabJenkins · 2020-02-26T09:19:52Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22427/
Test PASSed.

AmplabJenkins · 2020-02-26T11:49:30Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22429/
Test FAILed.

AmplabJenkins · 2020-02-26T12:03:54Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22431/
Test FAILed.

…nto exploration_api_action_noises

AmplabJenkins · 2020-02-26T15:23:23Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22433/
Test FAILed.

ericl

What's the testing strategy here, should we rerun a few of the mujoco benchmarks?

ericl · 2020-02-26T22:18:51Z

rllib/tests/test_checkpoint_restore.py

    assert not failures, failures
    print("All checkpoint restore tests passed!")

+    quit()


Revert the changes in this file for merge?

Ups, sorry, thanks for catching this!

ericl · 2020-02-26T22:27:41Z

rllib/utils/exploration/gaussian_noise.py

+        return action, logp
+
+    @override(Exploration)
+    def get_info(self):


I think it makes more sense for this to be a separate get_scale_value method rather than overload the existing info method (which seems like it's only for debugging).

Not sure I agree. Yes, right now, it's only used for recording this "info" (whatever it is for each Exploration class) in the train-results (Policy calls this generically and then tune json's it). This is useful, just like it is for current learning rate, memory consumption, etc.. (debugging/reporting snapshot of the state of the exploration object). I don't see any use for a specific get_scale_value method, as it would never be used by anyone (Policy should only use the top-level Exploration class methods, as it doesn't know, which specific sub-class it's holding).

I was referring to the comment you had here:

clean_actions, cur_noise_scale = self.sess.run( [self.output_actions, self.exploration.get_info()],

It seemed odd to be passing self.exploration.get_info() to the session run, but if this is a temporary hack I guess it's fine. Otherwise, it seems more clear to assert the exploration is of type GaussianNoise, and then call a gaussian-noise specific method, rather than sneak the tensor through via get_info().

Ah yes, absolutely. This is only a problem though right now as we don't have a parameter-noise exploration class yet and the parameter-noise logic piggybacks on that cur_noise_scale value (which it shouldn't, it should be separate exploration classes or a merged one). It's tagged as a TODO(sven) and will be fixed once we add ParameterNoise exploration classes.

sven1977 · 2020-02-27T11:09:02Z

Yes, I'll rerun HalfCheetah vs DDPG. ...

AmplabJenkins · 2020-02-27T11:56:51Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22494/
Test FAILed.

AmplabJenkins · 2020-02-27T12:14:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22495/
Test FAILed.

AmplabJenkins · 2020-02-27T12:25:51Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22496/
Test FAILed.

AmplabJenkins · 2020-02-27T17:10:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22501/
Test FAILed.

AmplabJenkins · 2020-02-27T17:25:29Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22500/
Test PASSed.

AmplabJenkins · 2020-02-27T17:27:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22502/
Test FAILed.

ericl

Looks good to me, pending verification of DDPG/TD3 results. We should probably add a couple MuJoCo regression tests to our standard compact-regression-test suite, it will be a little annoying because of the licensing.

Any suggestions on alternative OSS continuous benchmarks?

…oration_api_action_noises � Conflicts: � rllib/policy/tf_policy.py � rllib/rollout.py

AmplabJenkins · 2020-03-01T11:37:30Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22591/
Test PASSed.

AmplabJenkins · 2020-03-01T11:42:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22592/
Test FAILed.

sven1977 · 2020-03-01T13:18:28Z

@ericl Please merge. DDPG looks still good. I'll run a longer benchmark vs HalfCheetah and update the rl-experiments repo.

== Status ==
Memory usage on this node: 7.9/120.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/1 GPUs, 0.0/88.48 GiB heap, 0.0/12.84 GiB objects
Result logdir: /home/ubuntu/ray_results/halfcheetah-ddpg
Number of trials: 1 (1 TERMINATED)
+---------------------------+------------+-------+----------+------------------+--------+--------+
| Trial name                | status     | loc   |   reward |   total time (s) |     ts |   iter |
|---------------------------+------------+-------+----------+------------------+--------+--------|
| DDPG_HalfCheetah-v2_00000 | TERMINATED |       |  2006.98 |          3003.82 | 211000 |    211 |
+---------------------------+------------+-------+----------+------------------+--------+--------+

…project#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP.

sven1977 added 3 commits February 25, 2020 12:06

WIP.

9251d14

WIP.

a9d972b

Merge branch 'master' of https://github.com/ray-project/ray into expl…

32d2d9c

…oration_api_action_noises

WIP.

d78d7f4

sven1977 added 2 commits February 25, 2020 14:27

WIP.

9076afc

WIP.

9f21fa3

Fix

e4c909f

sven1977 added 2 commits February 25, 2020 17:20

WIP.

4f58628

Add TD3 quick Pendulum regresison.

afe11df

Cleanup.

e9228b3

sven1977 added 2 commits February 26, 2020 08:58

Fix.

c1ec71e

LINT.

507f588

sven1977 added 2 commits February 26, 2020 12:02

Fix.

9a01ed7

Sort quick_learning test cases, add TD3.

8dd1c00

sven1977 added 2 commits February 26, 2020 15:35

Sort quick_learning test cases, add TD3.

fc1085b

Merge remote-tracking branch 'origin/exploration_api_action_noises' i…

6194766

…nto exploration_api_action_noises

sven1977 marked this pull request as ready for review February 26, 2020 14:36

sven1977 requested a review from ericl February 26, 2020 15:26

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 26, 2020

ericl self-assigned this Feb 26, 2020

ericl reviewed Feb 26, 2020

View reviewed changes

Revert test_checkpoint_restore.py (debugging) changes.

6b54dc4

sven1977 added 2 commits February 27, 2020 12:36

Fix old soft_q settings in documentation and test configs.

03d6764

More doc fixes.

19b44c0

sven1977 added 3 commits February 27, 2020 17:02

Fix test case.

711bee4

Fix test case.

1c617cd

Lower test load.

549f25a

ericl reviewed Feb 27, 2020

View reviewed changes

ericl approved these changes Feb 27, 2020

View reviewed changes

sven1977 added 2 commits March 1, 2020 11:36

Merge branch 'master' of https://github.com/ray-project/ray into expl…

2327a70

…oration_api_action_noises � Conflicts: � rllib/policy/tf_policy.py � rllib/rollout.py

WIP.

5a8c722

ericl merged commit 83e06cd into ray-project:master Mar 1, 2020

sven1977 deleted the exploration_api_action_noises branch March 3, 2020 10:15

Conversation

sven1977 commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issue number

Checks

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

AmplabJenkins commented Feb 26, 2020

Uh oh!

AmplabJenkins commented Feb 26, 2020

Uh oh!

AmplabJenkins commented Feb 26, 2020

Uh oh!

AmplabJenkins commented Feb 26, 2020

Uh oh!

AmplabJenkins commented Feb 26, 2020

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

ericl Feb 26, 2020

Choose a reason for hiding this comment

Uh oh!

sven1977 Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

sven1977 Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

ericl Feb 26, 2020

Choose a reason for hiding this comment

Uh oh!

sven1977 Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

ericl Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

sven1977 Feb 28, 2020

Choose a reason for hiding this comment

Uh oh!

sven1977 commented Feb 27, 2020

Uh oh!

AmplabJenkins commented Feb 27, 2020

Uh oh!

AmplabJenkins commented Feb 27, 2020

Uh oh!

AmplabJenkins commented Feb 27, 2020

Uh oh!

AmplabJenkins commented Feb 27, 2020

Uh oh!

AmplabJenkins commented Feb 27, 2020

Uh oh!

AmplabJenkins commented Feb 27, 2020

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Mar 1, 2020

Uh oh!

AmplabJenkins commented Mar 1, 2020

Uh oh!

sven1977 commented Mar 1, 2020

Uh oh!

Reviewers

sven1977 commented Feb 25, 2020 •

edited

Loading