[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02. by sven1977 · Pull Request #21649 · ray-project/ray

sven1977 · 2022-01-17T15:32:03Z

Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #2.

Add asynchronous_parallel_requests utility to rllib/execution/parallel_requests.py. Allows sending (up to n in-flight) parallel remote requests to a set of actors and collecting and returning those results that are available (given some timeout).
docstring/comment cleanups
MixinMultiAgentReplayBuffer: Will be used by (in-the-pipeline) AlphaStar algo as default buffer on each GPU node to store training data for the n remote policy actors on the same node.
Preps Policy class to be usable as remote ray actor, adding e.g. get_host() and also a (preliminary) learn_on_batch_from_replay_buffer method.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gjoliver

overall looks great. a few comments, thanks.

gjoliver · 2022-01-19T11:50:26Z

rllib/execution/buffers/mixin_replay_buffer.py

+        buffer = self.replay_buffers[policy_id]
+        # If buffer empty or no new samples to mix with replayed ones,
+        # return None.
+        if len(buffer) == 0 or len(buffer.last_added_batches) == 0:


may be safer to check:

if len(buffer) == 0 and self.replay_ratio > 0.0: return None if len(buffer.last_added_batches) == 0 and self.replay_ratio < 1.0: return None

In other words, if we need to have replay samples and the buffer is empty, or we actually need to have newly added samples, but there isn't any, return None.

Not sure, actually.
If len(buffer) == 0, then self.last_added_batches should also be empty.
Everything that's in self.last_added_batches is guaranteed to be part of the buffer already.

What we should check, though is, whether replay_ratio == 1.0 (in this case, we do NOT care about buffer.last_added_batches because 100% is replayed (older samples)).

I'll fix this.

yeah, that's pretty much what I meant. thanks.

gjoliver · 2022-01-19T11:56:07Z

rllib/execution/buffers/mixin_replay_buffer.py

+            # replay ratio = old / [old + new]
+            num_new = len(output_batches)
+            num_old = 0
+            while random.random() > num_old / (num_old + num_new):


we are not looking at self.replay_ratio here?
can you explain a bit how this while loop is statistically connected to self.replay_ratio?

sorry I am still confused. old / [old + new] is current replay_ratio right?
the logic is we stop adding replayed samples when random.random() is less than current replay_ratio. which by gut feelings should result in an expected ratio of 0.5?

I feel like at the very least, self.replay_ratio should get involved in the math here, something like:

expected_replay_batches = self.replay_ratio * num_new while random.random() < expected_replay_batches: output_batches.append(buffer.replay()) expected_replay_batches -= 1.0

This has been fixed. There is also a test case now that verifies, different given ratios are being respected.

The randomness is necessary here, to allow for "odd" ratios. E.g. if we assume that there is always just one new batch and the replay ratio is say 0.33, then only every 2nd returned batch should have 1 additional old batch in it.

gjoliver · 2022-01-19T11:59:44Z

rllib/execution/parallel_requests.py

+    `remote_fn()`, which will be applied to the actor(s) instead.
+
+    Args:
+        trainer: The Trainer object that we run the sampling for.


pass in remote_requests_in_flight, instead of the entire trainer?

Fair enough.

sorry, just double checking, it seems like trainer is still the parameter?

Nope, this had been fixed. Could you take another look?

gjoliver · 2022-01-19T12:06:36Z

rllib/execution/parallel_requests.py

+
+
+@ExperimentalAPI
+def asynchronous_parallel_requests(


this function looks really familiar. are we not replacing some existing logics with this util func call somewhere?

Yeah, sorry, moved it into a new module for better clarity: This function may not only be used to collect SampleBatches from a RolloutWorker, but works generically on any set (and types!) of ray remote actors.

I removed the old code (not used yet anywhere anyways) in replay_ops.py.

gjoliver · 2022-01-19T12:10:04Z

rllib/policy/policy.py

+        """
+        # Sample a batch from the given replay actor.
+        # For better performance, make sure the replay actor is co-located
+        #  with this policy (on the same node).


maybe this comment needs to be updated?
where are we making sure the replay_actor is co-located with the policy?

The driver (execution plan/Trainer.setup) needs to make sure of that. But it's not hard-required, just better for performance as we don't have to send the batch across the wire, then.

I see. can. you update the comment a little bit and say:

For better performance, trainer will try to schedule replay actors co-located with this policy

something like that? thanks.

I'm not sure this comment should be here (we don't know for sure what the Trainer will do). The policy has no influence on its Trainer. We should simply clarify that it would be better, if they WERE on the same node, but it's not a hard requirement. And that the Trainer (that creates buffer and policy) needs to take care of this, not the policy itself.

I'll fix.

…ntralized_multi_agent_learning_03

…ed_multi_agent_learning_02 # Conflicts: # rllib/utils/typing.py

…ed_multi_agent_learning_02

…ntralized_multi_agent_learning_02

gjoliver

couple of minor comments, one math question. thanks.

gjoliver · 2022-01-25T18:49:07Z

rllib/execution/buffers/mixin_replay_buffer.py

+            # replay ratio = old / [old + new]
+            num_new = len(output_batches)
+            num_old = 0
+            while random.random() > num_old / (num_old + num_new):


sorry I am still confused. old / [old + new] is current replay_ratio right?
the logic is we stop adding replayed samples when random.random() is less than current replay_ratio. which by gut feelings should result in an expected ratio of 0.5?

I feel like at the very least, self.replay_ratio should get involved in the math here, something like:

expected_replay_batches = self.replay_ratio * num_new while random.random() < expected_replay_batches: output_batches.append(buffer.replay()) expected_replay_batches -= 1.0

gjoliver · 2022-01-25T18:50:06Z

rllib/execution/parallel_requests.py

+    `remote_fn()`, which will be applied to the actor(s) instead.
+
+    Args:
+        trainer: The Trainer object that we run the sampling for.


sorry, just double checking, it seems like trainer is still the parameter?

gjoliver · 2022-01-25T18:51:39Z

rllib/policy/policy.py

+        """
+        # Sample a batch from the given replay actor.
+        # For better performance, make sure the replay actor is co-located
+        #  with this policy (on the same node).


I see. can. you update the comment a little bit and say:

For better performance, trainer will try to schedule replay actors co-located with this policy

something like that? thanks.

…ntralized_multi_agent_learning_02

gjoliver

sorry, still have 2 comments.

gjoliver · 2022-01-27T15:26:16Z

rllib/execution/buffers/mixin_replay_buffer.py

+        # Mix buffer's last added batches with older replayed batches.
+        with self.replay_timer:
+            output_batches = self.last_added_batches[policy_id].copy()
+            self.last_added_batches[policy_id].clear()


since you clear right after copy, why not:

output_batches = self.last_added_batches[policy_id] self.last_added_batches[policy_id] = []

?

this is to save a .copy() op

do you think this is a good idea? this is the only comment I have left.

Great catch! Indeed, it saves the copy. Done. :)

rllib/execution/buffers/mixin_replay_buffer.py

gjoliver · 2022-01-27T15:57:18Z

On Thu, Jan 27, 2022 at 7:52 AM Sven Mika ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In rllib/execution/buffers/mixin_replay_buffer.py <#21649 (comment)>: > + with self.replay_timer: + output_batches = self.last_added_batches[policy_id].copy() + self.last_added_batches[policy_id].clear() + + # No replay desired -> Return here. + if self.replay_ratio == 0.0: + return SampleBatch.concat_samples(output_batches) + # Only replay desired -> Return a (replayed) sample from the + # buffer. + elif self.replay_ratio == 1.0: + return buffer.replay() + + # Replay ratio = old / [old + new] + # Replay proportion: old / new + num_new = len(output_batches) + replay_proportion = self.replay_proportion I think it's correct. That's how we also did it in the existing mixin implementation (in execution/replay_ops.py). Also the new test cases show that this logic is ok. We test for different replay ratios and measure the average batch composition (old vs new samples).

oh, I get it, yeah, we are doing the same replay_ratio to replay_proportion computation above now. ok, never mind, I missed that part, my bad.

…

— Reply to this email directly, view it on GitHub <#21649 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABQNWQLAL6262OBCDZK3XHTUYFS3DANCNFSM5ME7QPIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were assigned.Message ID: ***@***.***>

…ntralized_multi_agent_learning_02 # Conflicts: # rllib/execution/rollout_ops.py

mwtian · 2022-02-14T17:50:42Z

rllib/models/jax/misc.py

        # By default, use Glorot unform initializer.
        if initializer is None:
-            initializer = flax.nn.initializers.xavier_uniform()
+            initializer = nn.initializers.xavier_uniform()


@sven1977 I'm seeing AttributeError: module 'flax' has no attribute 'nn' on 1.11.0 release branch. Is this line cherry-pickable, or does it need to be cherry picked into 1.11.0?

Yes, this line is cherry-pickable. Let's also fix the comment:

Fixed code:

# By default, use Glorot uniform initializer. if initializer is None: initializer = nn.initializers.xavier_uniform()

sven1977 added 2 commits January 17, 2022 16:31

wip

f6f3581

wip

a9f3098

sven1977 requested a review from gjoliver January 18, 2022 11:29

sven1977 assigned gjoliver Jan 18, 2022

gjoliver reviewed Jan 19, 2022

View reviewed changes

sven1977 added 10 commits January 20, 2022 09:00

wip.

5fe33ee

fixes

2f2c546

fix

21efe03

Merge branch 'master' of https://github.com/ray-project/ray into dece…

05802c9

…ntralized_multi_agent_learning_03

wip

6616496

Merge branch 'master' of https://github.com/ray-project/ray into dece…

59c8a33

…ntralized_multi_agent_learning_03

fixes

2312bcd

fixes

b9b8e98

Merge branch 'master' of https://github.com/ray-project/ray into dece…

3a1b7f0

…ntralized_multi_agent_learning_03

merge

76d02ef

sven1977 requested a review from avnishn as a code owner January 25, 2022 09:37

sven1977 added 9 commits January 25, 2022 10:55

Merge branch 'master' of https://github.com/ray-project/ray into dece…

3ad6097

…ntralized_multi_agent_learning_03

wip.

e3c9222

fixes.

fb01568

Merge branch 'decentralized_multi_agent_learning_03' into decentraliz…

aa990c4

…ed_multi_agent_learning_02 # Conflicts: # rllib/utils/typing.py

wip.

616b467

fix

526e0e6

Merge branch 'decentralized_multi_agent_learning_03' into decentraliz…

86dad3a

…ed_multi_agent_learning_02

wip.

ae5e118

Merge branch 'master' of https://github.com/ray-project/ray into dece…

ada9719

…ntralized_multi_agent_learning_02

gjoliver reviewed Jan 25, 2022

View reviewed changes

sven1977 added 4 commits January 26, 2022 12:35

wip

73a589b

wip

baaf2ec

Merge branch 'master' of https://github.com/ray-project/ray into dece…

fda66dc

…ntralized_multi_agent_learning_02

wip.

9d0d9ec

Merge branch 'master' of https://github.com/ray-project/ray into dece…

f31d1d8

…ntralized_multi_agent_learning_02

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 27, 2022

gjoliver reviewed Jan 27, 2022

View reviewed changes

wip

303fff6

gjoliver approved these changes Jan 27, 2022

View reviewed changes

sven1977 added 2 commits January 27, 2022 17:08

Merge branch 'master' of https://github.com/ray-project/ray into dece…

ad18fcf

…ntralized_multi_agent_learning_02 # Conflicts: # rllib/execution/rollout_ops.py

wip.

bec657a

sven1977 merged commit ee41800 into ray-project:master Jan 27, 2022

avnishn approved these changes Jan 27, 2022

View reviewed changes

mwtian reviewed Feb 14, 2022

View reviewed changes

mwtian mentioned this pull request Feb 15, 2022

[Release branch] fix test_catalog.py #22387

Merged

6 tasks



		@ExperimentalAPI
		def asynchronous_parallel_requests(

Conversation

sven1977 commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

gjoliver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sven1977 Jan 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sven1977 Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gjoliver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gjoliver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gjoliver commented Jan 27, 2022 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sven1977 commented Jan 17, 2022 •

edited

Loading

sven1977 Jan 26, 2022 •

edited

Loading

sven1977 Jan 25, 2022 •

edited

Loading