Skip to content

[RLlib] DDPG refactor and Exploration API action noise classes.#7314

Merged
ericl merged 24 commits intoray-project:masterfrom
sven1977:exploration_api_action_noises
Mar 1, 2020
Merged

[RLlib] DDPG refactor and Exploration API action noise classes.#7314
ericl merged 24 commits intoray-project:masterfrom
sven1977:exploration_api_action_noises

Conversation

@sven1977
Copy link
Copy Markdown
Contributor

@sven1977 sven1977 commented Feb 25, 2020

  • The Exploration API is missing to complete P0 tasks:
    GaussianNoise and UrnsteinUhlenbeckNoise Exploration classes. These two are added with this PR (plus test cases).
  • DDPG (and TD3) are refactored to use these two classes.

Related issue number

Checks

@AmplabJenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22383/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22386/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22387/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22388/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22389/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22390/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22395/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22426/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22427/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22429/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22431/
Test FAILed.

@sven1977 sven1977 marked this pull request as ready for review February 26, 2020 14:36
@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22433/
Test FAILed.

@sven1977 sven1977 requested a review from ericl February 26, 2020 15:26
@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 26, 2020
@ericl ericl self-assigned this Feb 26, 2020
Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the testing strategy here, should we rerun a few of the mujoco benchmarks?

assert not failures, failures
print("All checkpoint restore tests passed!")

quit()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert the changes in this file for merge?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, sorry, thanks for catching this!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return action, logp

@override(Exploration)
def get_info(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense for this to be a separate get_scale_value method rather than overload the existing info method (which seems like it's only for debugging).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree. Yes, right now, it's only used for recording this "info" (whatever it is for each Exploration class) in the train-results (Policy calls this generically and then tune json's it). This is useful, just like it is for current learning rate, memory consumption, etc.. (debugging/reporting snapshot of the state of the exploration object). I don't see any use for a specific get_scale_value method, as it would never be used by anyone (Policy should only use the top-level Exploration class methods, as it doesn't know, which specific sub-class it's holding).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the comment you had here:

            clean_actions, cur_noise_scale = self.sess.run(
                [self.output_actions,
                 self.exploration.get_info()],

It seemed odd to be passing self.exploration.get_info() to the session run, but if this is a temporary hack I guess it's fine. Otherwise, it seems more clear to assert the exploration is of type GaussianNoise, and then call a gaussian-noise specific method, rather than sneak the tensor through via get_info().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, absolutely. This is only a problem though right now as we don't have a parameter-noise exploration class yet and the parameter-noise logic piggybacks on that cur_noise_scale value (which it shouldn't, it should be separate exploration classes or a merged one). It's tagged as a TODO(sven) and will be fixed once we add ParameterNoise exploration classes.

@sven1977
Copy link
Copy Markdown
Contributor Author

Yes, I'll rerun HalfCheetah vs DDPG. ...

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22494/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22495/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22496/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22501/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22500/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22502/
Test FAILed.

Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, pending verification of DDPG/TD3 results. We should probably add a couple MuJoCo regression tests to our standard compact-regression-test suite, it will be a little annoying because of the licensing.

Any suggestions on alternative OSS continuous benchmarks?

sven1977 added 2 commits March 1, 2020 11:36
…oration_api_action_noises

� Conflicts:
�	rllib/policy/tf_policy.py
�	rllib/rollout.py
@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22591/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22592/
Test FAILed.

@sven1977
Copy link
Copy Markdown
Contributor Author

sven1977 commented Mar 1, 2020

@ericl Please merge. DDPG looks still good. I'll run a longer benchmark vs HalfCheetah and update the rl-experiments repo.

== Status ==
Memory usage on this node: 7.9/120.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/1 GPUs, 0.0/88.48 GiB heap, 0.0/12.84 GiB objects
Result logdir: /home/ubuntu/ray_results/halfcheetah-ddpg
Number of trials: 1 (1 TERMINATED)
+---------------------------+------------+-------+----------+------------------+--------+--------+
| Trial name                | status     | loc   |   reward |   total time (s) |     ts |   iter |
|---------------------------+------------+-------+----------+------------------+--------+--------|
| DDPG_HalfCheetah-v2_00000 | TERMINATED |       |  2006.98 |          3003.82 | 211000 |    211 |
+---------------------------+------------+-------+----------+------------------+--------+--------+

@ericl ericl merged commit 83e06cd into ray-project:master Mar 1, 2020
@sven1977 sven1977 deleted the exploration_api_action_noises branch March 3, 2020 10:15
ffbin pushed a commit to antgroup/ant-ray that referenced this pull request Mar 20, 2020
…project#7314)

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix

* WIP.

* Add TD3 quick Pendulum regresison.

* Cleanup.

* Fix.

* LINT.

* Fix.

* Sort quick_learning test cases, add TD3.

* Sort quick_learning test cases, add TD3.

* Revert test_checkpoint_restore.py (debugging) changes.

* Fix old soft_q settings in documentation and test configs.

* More doc fixes.

* Fix test case.

* Fix test case.

* Lower test load.

* WIP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants