[RLlib] First attempt at cleaning up algo code in RLlib: PG.#10115
[RLlib] First attempt at cleaning up algo code in RLlib: PG.#10115sven1977 merged 24 commits intoray-project:masterfrom
Conversation
|
|
||
| def pg_torch_loss(policy, model, dist_class, train_batch): | ||
| """The basic policy gradients loss.""" | ||
| def pg_torch_loss( |
There was a problem hiding this comment.
This is framework-agnostic, so it shouldn't be located in any of the two policy files.
|
|
||
| Feature Compatibility Matrix | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| Available Algorithms - Overview |
There was a problem hiding this comment.
Changed the title to something simpler. Added transformer support links to respective algos.
There was a problem hiding this comment.
Can we split doc changes into a separate PR?
There was a problem hiding this comment.
done, separate PR now
| @@ -0,0 +1,16 @@ | |||
| Policy Gradient (PG) | |||
There was a problem hiding this comment.
Minmal overview of the algo in a readme file.
There was a problem hiding this comment.
This seems to mostly duplicate the documentation. Can we instead provide a hyperlink to the doc page?
rllib/agents/pg/config_schema.py
Outdated
| @@ -0,0 +1,8 @@ | |||
| # Experimental. | |||
There was a problem hiding this comment.
A jsonschema dict to check for type/bounds/etc. violations in our configs.
This will not stay here for PG (because PG does not add any own keys to COMMON_CONFIG), it's just here for discussion. For other algos, though, it'll look like this.
There was a problem hiding this comment.
IMO schema is not needed for config-- the few cases it helps are catching things like negative workers, which isn't something that users are confused by typically. On the flip side, it adds a lot of boilerplate that distracts from the actual config.
There was a problem hiding this comment.
Yeah, but that's why I separated the schema from the actual config (which is unchanged and still in pg.py). The user won't even see this unless she checks out this extra schema file.
There was a problem hiding this comment.
Please remove this for now, as we haven't decided whether to use schema. It's confusing to have this file exist, even if separate.
| @@ -1,9 +1,32 @@ | |||
| """ | |||
There was a problem hiding this comment.
Add comment header to each main algo file, citing the papers.
|
|
||
|
|
||
| def get_policy_class(config): | ||
| def get_policy_class(config: TrainerConfigDict) -> Optional[Type[Policy]]: |
There was a problem hiding this comment.
Add docstrings and type annotations consequently to all agent files.
| return PGTFPolicy | ||
|
|
||
|
|
||
| # Build a child class of `Trainer`, which uses the framework specific Policy |
There was a problem hiding this comment.
Explain what these calls to build_... actually do.
rllib/agents/ppo/appo.py
Outdated
| from ray.rllib.agents.ppo.appo_tf_policy import AsyncPPOTFPolicy | ||
| from ray.rllib.agents.ppo.ppo import UpdateKL | ||
| from ray.rllib.agents.trainer import with_base_config | ||
| from ray.rllib.agents.trainer import merge_trainer_configs |
rllib/agents/trainer_template.py
Outdated
| def _init(self, | ||
| config: TrainerConfigDict, | ||
| env_creator: Callable[[EnvConfigDict], EnvType]): | ||
| # Validate config via jsonschema, if one was provided. |
There was a problem hiding this comment.
The actual jsonschema check.
ericl
left a comment
There was a problem hiding this comment.
I like the comments and improved organization of pg.py, but can we drop the json schema for now? I think it's a net negative for readability. Also, it would be great to not change the docs in this PR (should change all at once later).
|
|
||
| Feature Compatibility Matrix | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| Available Algorithms - Overview |
There was a problem hiding this comment.
Can we split doc changes into a separate PR?
doc/source/rllib-algorithms.rst
Outdated
|
|
||
| **Paper:** `Policy Gradient Methods for Reinforcement Learning with Function Approximation <https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__ | ||
|
|
||
| **Implementation:** We include a `vanilla policy gradients implementation <https://github.com/ray-project/ray/blob/master/rllib/agents/pg/pg.py>`__ as an example algorithm. |
There was a problem hiding this comment.
I think the original formatting is better; this is too verbose.
| @@ -0,0 +1,16 @@ | |||
| Policy Gradient (PG) | |||
There was a problem hiding this comment.
This seems to mostly duplicate the documentation. Can we instead provide a hyperlink to the doc page?
rllib/agents/pg/config_schema.py
Outdated
| @@ -0,0 +1,8 @@ | |||
| # Experimental. | |||
There was a problem hiding this comment.
IMO schema is not needed for config-- the few cases it helps are catching things like negative workers, which isn't something that users are confused by typically. On the flip side, it adds a lot of boilerplate that distracts from the actual config.
| @@ -1,9 +1,32 @@ | |||
| """ | |||
rllib/agents/pg/pg.py
Outdated
| [2] - Simple Statistical Gradient-Following Algorithms for Connectionist | ||
| Reinforcement Learning. | ||
| Williams - College of Computer Science - Northeastern University - 1992 | ||
| http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf |
There was a problem hiding this comment.
Should we add "This file defines the distributed trainer for policy gradients; see pg_policy.py for the definition of the policy loss?"
| def pg_loss_stats( | ||
| policy: Policy, | ||
| train_batch: SampleBatch) -> Dict[str, TensorType]: | ||
|
|
There was a problem hiding this comment.
Please remove the stray newline.
…_cleanup_pg � Conflicts: � rllib/agents/trainer_template.py � rllib/policy/tf_policy_template.py
ericl
left a comment
There was a problem hiding this comment.
Looks good, just a couple minor requests still.
rllib/agents/pg/README.md
Outdated
| Papers | ||
| ------ | ||
|
|
||
| [1] [Policy Gradient Methods for Reinforcement Learning with Function Approximation](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf) |
There was a problem hiding this comment.
Please remove this section (duplicates docs).
rllib/agents/pg/config_schema.py
Outdated
| @@ -0,0 +1,8 @@ | |||
| # Experimental. | |||
There was a problem hiding this comment.
Please remove this for now, as we haven't decided whether to use schema. It's confusing to have this file exist, even if separate.
rllib/agents/pg/pg.py
Outdated
| """ | ||
| Implementation of a vanilla policy gradients algorithm. Based on: | ||
|
|
||
| [1] - Policy Gradient Methods for Reinforcement Learning with Function |
There was a problem hiding this comment.
Please remove this section in favor of a link to the docs.
| def pg_loss_stats( | ||
| policy: Policy, | ||
| train_batch: SampleBatch) -> Dict[str, TensorType]: | ||
|
|
There was a problem hiding this comment.
Please remove the stray newline.
|
I think you missed the comments about removing the paper citations in favor of just the doc link -- any reason not to do that? |
|
|
Note that the absence of "author-action-required" implies reviewer action required, no need to tag. |
A common user feedback for RLlib developers is that its learning curve is quite steep. This PR aims to start a clean-up and code clarification process with the example of the PG algorithm.
Related issue number
Checks
scripts/format.shto lint the changes in this PR.