Skip to content

[rllib] Enable performance metrics reporting for RLlib pipelines, add A3C#7299

Merged
ericl merged 52 commits intoray-project:masterfrom
ericl:rllib-iter2
Feb 29, 2020
Merged

[rllib] Enable performance metrics reporting for RLlib pipelines, add A3C#7299
ericl merged 52 commits intoray-project:masterfrom
ericl:rllib-iter2

Conversation

@ericl
Copy link
Copy Markdown
Contributor

@ericl ericl commented Feb 24, 2020

Why are these changes needed?

This adds a MetricsContext concept that is used for recording and retrieving internal performance metrics of RLlib pipelines. The context is shared across all pipeline operators. Operators can record counters, info values, and timer values via the context.

This also adds support for A3C and the AsyncGradients() operator, which uses the metrics context in a couple new ways.

@ericl ericl requested a review from sven1977 February 24, 2020 21:07
import time


class _Timer:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is just moved from rllib/utils to be accessible to ray/util.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@AmplabJenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22340/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22338/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22368/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22418/
Test FAILed.

@ericl ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 26, 2020
@ericl
Copy link
Copy Markdown
Contributor Author

ericl commented Feb 27, 2020

Ping @sven1977 , this should be reviewed first before the other PR.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22513/
Test FAILed.

Copy link
Copy Markdown
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the docstring and the "optimizer" question.

from_actors, ParallelIteratorWorker, LocalIterator


def test_metrics(ray_start_regular_shared):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Unittest.TestCase?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use pytest style, since it has better reporting behaviour under bazel (capturing the stdout).

@@ -0,0 +1,19 @@
"""Experimental pipeline-based impl; run this with --run='A3C_pl'"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

if hasattr(self, "workers"):
self.workers.stop()
if hasattr(self, "optimizer"):
if hasattr(self, "optimizer") and self.optimizer:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about these things. If Trainer is not sure to have a prop "optimizer" or not, should this prop not be part of the Trainer class (and then sometimes be None)? I know we do this in many places, so not urgent, but we should think about fixing these.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the right solution is to make it a required prop of a subclass of Trainer.

I'll leave it alone for now, since we can probably remove .optimizer entirely once the pipeline implementation becomes the default.

@ericl ericl merged commit 3c6b94f into ray-project:master Feb 29, 2020
ffbin pushed a commit to antgroup/ant-ray that referenced this pull request Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants