Skip to content

[RLlib] First attempt at cleaning up algo code in RLlib: PG.#10115

Merged
sven1977 merged 24 commits intoray-project:masterfrom
sven1977:code_cleanup_pg
Aug 20, 2020
Merged

[RLlib] First attempt at cleaning up algo code in RLlib: PG.#10115
sven1977 merged 24 commits intoray-project:masterfrom
sven1977:code_cleanup_pg

Conversation

@sven1977
Copy link
Copy Markdown
Contributor

@sven1977 sven1977 commented Aug 14, 2020

A common user feedback for RLlib developers is that its learning curve is quite steep. This PR aims to start a clean-up and code clarification process with the example of the PG algorithm.

  • adds experimental jsonschema checking.
  • adds lots of comments to algo code.
  • adds README.md to PG directory, linking to the correct doc section(s).
  • minor other cleanups (docstrings, type annotations, etc..).

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@sven1977 sven1977 requested a review from ericl August 14, 2020 14:57

def pg_torch_loss(policy, model, dist_class, train_batch):
"""The basic policy gradients loss."""
def pg_torch_loss(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is framework-agnostic, so it shouldn't be located in any of the two policy files.


Feature Compatibility Matrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available Algorithms - Overview
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the title to something simpler. Added transformer support links to respective algos.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split doc changes into a separate PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, separate PR now

@@ -0,0 +1,16 @@
Policy Gradient (PG)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minmal overview of the algo in a readme file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to mostly duplicate the documentation. Can we instead provide a hyperlink to the doc page?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,8 @@
# Experimental.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A jsonschema dict to check for type/bounds/etc. violations in our configs.
This will not stay here for PG (because PG does not add any own keys to COMMON_CONFIG), it's just here for discussion. For other algos, though, it'll look like this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO schema is not needed for config-- the few cases it helps are catching things like negative workers, which isn't something that users are confused by typically. On the flip side, it adds a lot of boilerplate that distracts from the actual config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that's why I separated the schema from the actual config (which is unchanged and still in pg.py). The user won't even see this unless she checks out this extra schema file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this for now, as we haven't decided whether to use schema. It's confusing to have this file exist, even if separate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1,9 +1,32 @@
"""
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment header to each main algo file, citing the papers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.



def get_policy_class(config):
def get_policy_class(config: TrainerConfigDict) -> Optional[Type[Policy]]:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstrings and type annotations consequently to all agent files.

return PGTFPolicy


# Build a child class of `Trainer`, which uses the framework specific Policy
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what these calls to build_... actually do.

from ray.rllib.agents.ppo.appo_tf_policy import AsyncPPOTFPolicy
from ray.rllib.agents.ppo.ppo import UpdateKL
from ray.rllib.agents.trainer import with_base_config
from ray.rllib.agents.trainer import merge_trainer_configs
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify

def _init(self,
config: TrainerConfigDict,
env_creator: Callable[[EnvConfigDict], EnvType]):
# Validate config via jsonschema, if one was provided.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual jsonschema check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed again.

Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the comments and improved organization of pg.py, but can we drop the json schema for now? I think it's a net negative for readability. Also, it would be great to not change the docs in this PR (should change all at once later).


Feature Compatibility Matrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available Algorithms - Overview
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split doc changes into a separate PR?


**Paper:** `Policy Gradient Methods for Reinforcement Learning with Function Approximation <https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__

**Implementation:** We include a `vanilla policy gradients implementation <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__ as an example algorithm.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original formatting is better; this is too verbose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,16 @@
Policy Gradient (PG)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to mostly duplicate the documentation. Can we instead provide a hyperlink to the doc page?

@@ -0,0 +1,8 @@
# Experimental.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO schema is not needed for config-- the few cases it helps are catching things like negative workers, which isn't something that users are confused by typically. On the flip side, it adds a lot of boilerplate that distracts from the actual config.

@@ -1,9 +1,32 @@
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

[2] - Simple Statistical Gradient-Following Algorithms for Connectionist
Reinforcement Learning.
Williams - College of Computer Science - Northeastern University - 1992
http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add "This file defines the distributed trainer for policy gradients; see pg_policy.py for the definition of the policy loss?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def pg_loss_stats(
policy: Policy,
train_batch: SampleBatch) -> Dict[str, TensorType]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray newline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the stray newline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 14, 2020
…_cleanup_pg

� Conflicts:
�	rllib/agents/trainer_template.py
�	rllib/policy/tf_policy_template.py
Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple minor requests still.

Papers
------

[1] [Policy Gradient Methods for Reinforcement Learning with Function Approximation](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this section (duplicates docs).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,8 @@
# Experimental.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this for now, as we haven't decided whether to use schema. It's confusing to have this file exist, even if separate.

"""
Implementation of a vanilla policy gradients algorithm. Based on:

[1] - Policy Gradient Methods for Reinforcement Learning with Function
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this section in favor of a link to the docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def pg_loss_stats(
policy: Policy,
train_batch: SampleBatch) -> Dict[str, TensorType]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the stray newline.

@sven1977 sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 18, 2020
@ericl
Copy link
Copy Markdown
Contributor

ericl commented Aug 18, 2020

I think you missed the comments about removing the paper citations in favor of just the doc link -- any reason not to do that?

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 18, 2020
@sven1977 sven1977 added @reviewer-action-required and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Aug 19, 2020
@ericl
Copy link
Copy Markdown
Contributor

ericl commented Aug 19, 2020

_pickle.PicklingError: Can't pickle typing.Union[str, typing.Any, NoneType]: it's not the same object as typing.Union

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 19, 2020
@ericl
Copy link
Copy Markdown
Contributor

ericl commented Aug 19, 2020

Note that the absence of "author-action-required" implies reviewer action required, no need to tag.

@sven1977 sven1977 merged commit d14b501 into ray-project:master Aug 20, 2020
@sven1977 sven1977 deleted the code_cleanup_pg branch August 21, 2020 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants