[RLlib] DDPG refactor and Exploration API action noise classes.#7314
[RLlib] DDPG refactor and Exploration API action noise classes.#7314ericl merged 24 commits intoray-project:masterfrom
Conversation
…oration_api_action_noises
|
Can one of the admins verify this patch? |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
…nto exploration_api_action_noises
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
What's the testing strategy here, should we rerun a few of the mujoco benchmarks?
| assert not failures, failures | ||
| print("All checkpoint restore tests passed!") | ||
|
|
||
| quit() |
There was a problem hiding this comment.
Revert the changes in this file for merge?
There was a problem hiding this comment.
Ups, sorry, thanks for catching this!
| return action, logp | ||
|
|
||
| @override(Exploration) | ||
| def get_info(self): |
There was a problem hiding this comment.
I think it makes more sense for this to be a separate get_scale_value method rather than overload the existing info method (which seems like it's only for debugging).
There was a problem hiding this comment.
Not sure I agree. Yes, right now, it's only used for recording this "info" (whatever it is for each Exploration class) in the train-results (Policy calls this generically and then tune json's it). This is useful, just like it is for current learning rate, memory consumption, etc.. (debugging/reporting snapshot of the state of the exploration object). I don't see any use for a specific get_scale_value method, as it would never be used by anyone (Policy should only use the top-level Exploration class methods, as it doesn't know, which specific sub-class it's holding).
There was a problem hiding this comment.
I was referring to the comment you had here:
clean_actions, cur_noise_scale = self.sess.run(
[self.output_actions,
self.exploration.get_info()],
It seemed odd to be passing self.exploration.get_info() to the session run, but if this is a temporary hack I guess it's fine. Otherwise, it seems more clear to assert the exploration is of type GaussianNoise, and then call a gaussian-noise specific method, rather than sneak the tensor through via get_info().
There was a problem hiding this comment.
Ah yes, absolutely. This is only a problem though right now as we don't have a parameter-noise exploration class yet and the parameter-noise logic piggybacks on that cur_noise_scale value (which it shouldn't, it should be separate exploration classes or a merged one). It's tagged as a TODO(sven) and will be fixed once we add ParameterNoise exploration classes.
|
Yes, I'll rerun HalfCheetah vs DDPG. ... |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Looks good to me, pending verification of DDPG/TD3 results. We should probably add a couple MuJoCo regression tests to our standard compact-regression-test suite, it will be a little annoying because of the licensing.
Any suggestions on alternative OSS continuous benchmarks?
…oration_api_action_noises � Conflicts: � rllib/policy/tf_policy.py � rllib/rollout.py
|
Test PASSed. |
|
Test FAILed. |
|
@ericl Please merge. DDPG looks still good. I'll run a longer benchmark vs HalfCheetah and update the rl-experiments repo. |
…project#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP.
GaussianNoise and UrnsteinUhlenbeckNoise Exploration classes. These two are added with this PR (plus test cases).
Related issue number
Checks
scripts/format.shto lint the changes in this PR.