[RLlib] First attempt at cleaning up algo code in RLlib: PG. by sven1977 · Pull Request #10115 · ray-project/ray

sven1977 · 2020-08-14T14:57:07Z

A common user feedback for RLlib developers is that its learning curve is quite steep. This PR aims to start a clean-up and code clarification process with the example of the PG algorithm.

adds experimental jsonschema checking.
adds lots of comments to algo code.
adds README.md to PG directory, linking to the correct doc section(s).
minor other cleanups (docstrings, type annotations, etc..).

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested (please justify below)

sven1977 · 2020-08-14T15:00:52Z

rllib/agents/pg/pg_torch_policy.py


-def pg_torch_loss(policy, model, dist_class, train_batch):
-    """The basic policy gradients loss."""
+def pg_torch_loss(


This is framework-agnostic, so it shouldn't be located in any of the two policy files.

sven1977 · 2020-08-14T15:20:43Z

doc/source/rllib-algorithms.rst


-Feature Compatibility Matrix
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Available Algorithms - Overview


Changed the title to something simpler. Added transformer support links to respective algos.

Can we split doc changes into a separate PR?

done, separate PR now

sven1977 · 2020-08-14T15:22:13Z

rllib/agents/pg/README.md

@@ -0,0 +1,16 @@
+Policy Gradient (PG)


Minmal overview of the algo in a readme file.

This seems to mostly duplicate the documentation. Can we instead provide a hyperlink to the doc page?

sven1977 · 2020-08-14T15:23:18Z

rllib/agents/pg/config_schema.py

@@ -0,0 +1,8 @@
+# Experimental.


A jsonschema dict to check for type/bounds/etc. violations in our configs.
This will not stay here for PG (because PG does not add any own keys to COMMON_CONFIG), it's just here for discussion. For other algos, though, it'll look like this.

IMO schema is not needed for config-- the few cases it helps are catching things like negative workers, which isn't something that users are confused by typically. On the flip side, it adds a lot of boilerplate that distracts from the actual config.

Yeah, but that's why I separated the schema from the actual config (which is unchanged and still in pg.py). The user won't even see this unless she checks out this extra schema file.

Please remove this for now, as we haven't decided whether to use schema. It's confusing to have this file exist, even if separate.

sven1977 · 2020-08-14T15:23:36Z

rllib/agents/pg/pg.py

@@ -1,9 +1,32 @@
+"""


Add comment header to each main algo file, citing the papers.

Looks great.

sven1977 · 2020-08-14T15:23:53Z

rllib/agents/pg/pg.py



-def get_policy_class(config):
+def get_policy_class(config: TrainerConfigDict) -> Optional[Type[Policy]]:


Add docstrings and type annotations consequently to all agent files.

sven1977 · 2020-08-14T15:24:06Z

rllib/agents/pg/pg.py

-        return PGTFPolicy


+# Build a child class of `Trainer`, which uses the framework specific Policy


Explain what these calls to build_... actually do.

sven1977 · 2020-08-14T15:24:45Z

rllib/agents/ppo/appo.py

 from ray.rllib.agents.ppo.appo_tf_policy import AsyncPPOTFPolicy
 from ray.rllib.agents.ppo.ppo import UpdateKL
-from ray.rllib.agents.trainer import with_base_config
+from ray.rllib.agents.trainer import merge_trainer_configs


sven1977 · 2020-08-14T15:25:11Z

rllib/agents/trainer_template.py

+        def _init(self,
+                  config: TrainerConfigDict,
+                  env_creator: Callable[[EnvConfigDict], EnvType]):
+            # Validate config via jsonschema, if one was provided.


The actual jsonschema check.

removed again.

ericl

I like the comments and improved organization of pg.py, but can we drop the json schema for now? I think it's a net negative for readability. Also, it would be great to not change the docs in this PR (should change all at once later).

ericl · 2020-08-14T21:13:29Z

doc/source/rllib-algorithms.rst


-Feature Compatibility Matrix
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Available Algorithms - Overview


Can we split doc changes into a separate PR?

ericl · 2020-08-14T21:14:03Z

doc/source/rllib-algorithms.rst

+
+**Paper:** `Policy Gradient Methods for Reinforcement Learning with Function Approximation <https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__
+
+**Implementation:** We include a `vanilla policy gradients implementation <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__ as an example algorithm.


I think the original formatting is better; this is too verbose.

ericl · 2020-08-14T21:14:36Z

rllib/agents/pg/README.md

@@ -0,0 +1,16 @@
+Policy Gradient (PG)


This seems to mostly duplicate the documentation. Can we instead provide a hyperlink to the doc page?

ericl · 2020-08-14T21:15:36Z

rllib/agents/pg/config_schema.py

@@ -0,0 +1,8 @@
+# Experimental.


IMO schema is not needed for config-- the few cases it helps are catching things like negative workers, which isn't something that users are confused by typically. On the flip side, it adds a lot of boilerplate that distracts from the actual config.

ericl · 2020-08-14T21:15:46Z

rllib/agents/pg/pg.py

@@ -1,9 +1,32 @@
+"""


Looks great.

ericl · 2020-08-14T21:16:26Z

rllib/agents/pg/pg.py

+[2] - Simple Statistical Gradient-Following Algorithms for Connectionist
+    Reinforcement Learning.
+    Williams - College of Computer Science - Northeastern University - 1992
+    http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf


Should we add "This file defines the distributed trainer for policy gradients; see pg_policy.py for the definition of the policy loss?"

ericl · 2020-08-14T21:17:16Z

rllib/agents/pg/pg_torch_policy.py

+def pg_loss_stats(
+        policy: Policy,
+        train_batch: SampleBatch) -> Dict[str, TensorType]:
+


Stray newline.

Please remove the stray newline.

…_cleanup_pg

…_cleanup_pg � Conflicts: � rllib/agents/trainer_template.py � rllib/policy/tf_policy_template.py

ericl

Looks good, just a couple minor requests still.

ericl · 2020-08-17T20:03:11Z

rllib/agents/pg/README.md

+Papers
+------
+
+[1] [Policy Gradient Methods for Reinforcement Learning with Function Approximation](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)


Please remove this section (duplicates docs).

ericl · 2020-08-17T20:03:37Z

rllib/agents/pg/config_schema.py

@@ -0,0 +1,8 @@
+# Experimental.


Please remove this for now, as we haven't decided whether to use schema. It's confusing to have this file exist, even if separate.

ericl · 2020-08-17T20:03:53Z

rllib/agents/pg/pg.py

+"""
+Implementation of a vanilla policy gradients algorithm. Based on:
+
+[1] - Policy Gradient Methods for Reinforcement Learning with Function


Please remove this section in favor of a link to the docs.

ericl · 2020-08-17T20:04:13Z

rllib/agents/pg/pg_torch_policy.py

+def pg_loss_stats(
+        policy: Policy,
+        train_batch: SampleBatch) -> Dict[str, TensorType]:
+


Please remove the stray newline.

…_cleanup_pg

ericl · 2020-08-18T19:02:12Z

I think you missed the comments about removing the paper citations in favor of just the doc link -- any reason not to do that?

…_cleanup_pg

ericl · 2020-08-19T20:26:19Z

_pickle.PicklingError: Can't pickle typing.Union[str, typing.Any, NoneType]: it's not the same object as typing.Union

ericl · 2020-08-19T20:26:43Z

Note that the absence of "author-action-required" implies reviewer action required, no need to tag.

…_cleanup_pg

sven1977 added 5 commits August 14, 2020 15:05

WIP.

bc6d467

WIP.

afd8908

WIP.

710fbca

WIP.

47df984

WIP.

6300c1c

sven1977 assigned ericl Aug 14, 2020

sven1977 requested a review from ericl August 14, 2020 14:57

sven1977 commented Aug 14, 2020

View reviewed changes

Docs.

b2f89ed

sven1977 commented Aug 14, 2020

View reviewed changes

Fixes.

35e1ffe

sven1977 commented Aug 14, 2020

View reviewed changes

ericl requested changes Aug 14, 2020

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 14, 2020

sven1977 added 3 commits August 15, 2020 12:40

Merge branch 'master' of https://github.com/ray-project/ray into code…

cbe924d

…_cleanup_pg

Merge branch 'master' of https://github.com/ray-project/ray into code…

296d289

…_cleanup_pg � Conflicts: � rllib/agents/trainer_template.py � rllib/policy/tf_policy_template.py

WIP.

5f20b56

ericl reviewed Aug 17, 2020

View reviewed changes

sven1977 added 2 commits August 18, 2020 17:37

Merge branch 'master' of https://github.com/ray-project/ray into code…

e8e15dc

…_cleanup_pg

Fixes.

fcc218c

sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 18, 2020

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 18, 2020

Test

bc2c02e

sven1977 added 2 commits August 19, 2020 08:16

Merge branch 'master' of https://github.com/ray-project/ray into code…

5a1ad99

…_cleanup_pg

Test

f1367bd

sven1977 added @reviewer-action-required and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Aug 19, 2020

ericl removed the @reviewer-action-required label Aug 19, 2020

ericl approved these changes Aug 19, 2020

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 19, 2020

sven1977 added 9 commits August 20, 2020 05:30

Merge branch 'master' of https://github.com/ray-project/ray into code…

c0ac0a0

…_cleanup_pg

Merge branch 'master' of https://github.com/ray-project/ray into code…

331b2f9

…_cleanup_pg

WIP.

a6857ee

WIP.

a547b74

WIP.

b72f76a

WIP.

04a23cc

LINT.

05d263b

LINT.

052e410

LINT.

ec2d730

sven1977 merged commit d14b501 into ray-project:master Aug 20, 2020

sven1977 deleted the code_cleanup_pg branch August 21, 2020 07:47

This was referenced Sep 7, 2020

[RLlib] Add type annotations for agents/dqn #10626

Merged

[RLlib] Add docstrings for agents/dqn #10710

Merged



		def get_policy_class(config):
		def get_policy_class(config: TrainerConfigDict) -> Optional[Type[Policy]]:

		return PGTFPolicy


		# Build a child class of `Trainer`, which uses the framework specific Policy


		Paper: `Policy Gradient Methods for Reinforcement Learning with Function Approximation <https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__

		Implementation: We include a `vanilla policy gradients implementation <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__ as an example algorithm.

Conversation

sven1977 commented Aug 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sven1977 commented Aug 14, 2020 •

edited

Loading