[RLlib] Decentralized multi-agent learning; PR #01#21421
[RLlib] Decentralized multi-agent learning; PR #01#21421sven1977 merged 12 commits intoray-project:masterfrom
Conversation
…ntralized_multi_agent_learning_01 # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/rollout_worker.py # rllib/policy/policy.py
gjoliver
left a comment
There was a problem hiding this comment.
a few questions and comments. thanks.
| "worker_side_prioritization": True, | ||
| "min_iter_time_s": 30, | ||
| }, | ||
| _allow_unknown_configs=True, |
There was a problem hiding this comment.
We are using Trainer's merge utility. It requires that if the second config (APEX-DDPG's) contains new keys that you set this to True.
Otherwise, it would complain about the new key (e.g. ) not being found in the first config (DDPG's).
rllib/agents/dqn/apex.py
Outdated
| "num_replay_buffer_shards"] | ||
| replay_actors = create_colocated(ReplayActor, [ | ||
|
|
||
| args = [ |
There was a problem hiding this comment.
name this replay_actor_args may be clearer.
rllib/utils/actors.py
Outdated
|
|
||
| def split_colocated(actors): | ||
| localhost = platform.node() | ||
| def split_colocated(actors, node=None): |
There was a problem hiding this comment.
do we really need to allow node param to be None?
the only user of this util provides node parameter when calling this.
also not sure if it's intuitive that split_colocated would split based on the node of the first actor if node param is not specified.
like this behavior feels a bit random?
There was a problem hiding this comment.
Fixed this. Should behave backward-compatibly now, with node="localhost" being the default (same behavior as before, where node arg didn't exist and we always tried to locate on localhost).
Also added docstrings.
|
|
||
|
|
||
| def try_create_colocated(cls, args, count, kwargs=None, node=None): | ||
| kwargs = kwargs or {} |
There was a problem hiding this comment.
why don't we make {} the default value for the argument?
There was a problem hiding this comment.
Dangerous, as python will keep that {} dict around, so if you change it in one function call (add key/values to it), the next time you call the function w/o providing the arg, the function will use the altered dict (the one with the added key/value pair).
There was a problem hiding this comment.
oh right, I forgot {} is banned as defaults actually. thanks.
rllib/utils/actors.py
Outdated
| return ok | ||
|
|
||
|
|
||
| def try_create_colocated(cls, args, count, kwargs=None, node=None): |
There was a problem hiding this comment.
Can we add a comment here about node being None? Seems it's a pretty important detail that if node is None, it means we don't care which node these actors are colocated to, as long as they are together.
There was a problem hiding this comment.
Yes, I'll add (better) docstrings to all these.
| # Whether all shards of the replay buffer must be co-located | ||
| # with the learner process (running the execution plan). | ||
| # If False, replay shards may be created on different node(s). | ||
| "replay_buffer_shards_colocated_with_driver": True, |
There was a problem hiding this comment.
actually I wonder, why do they need to be on the same node?
There was a problem hiding this comment.
For APEX, local node is the learner, so data (one in the buffer shards) never has to travel again. I think that's the sole intention here.
There was a problem hiding this comment.
I see I see. to be honest, this doesn't feel like a requirement to me, more like an optimization.
since we don't have viability guarantee from Ray core, If it's up to me, I would choose to do this as a best-effort thing.
like trying to colocate everything, and if that fails, schedule the other rb shards anywhere.
then we don't have the while loop, and this scheduling can finish in at most 2 steps.
it is obviously too big of a change. maybe just add a note/todo somewhere???
as written, I am a little worried a stack may fail with mysterious error message like "fail to schedule RB actors" while there are enough CPUs, just a small head node.
There was a problem hiding this comment.
Just a note: This is nothing new that I introduced here for APEX. We have always forced all replay shards to be located on the driver. This change actually allows users (via setting this new flag to False) to relax this constraint.
There was a problem hiding this comment.
I can add a comment to explain this more. ...
rllib/utils/actors.py
Outdated
| # Maps types to lists of already co-located actors. | ||
| ok = [[] for _ in range(len(actor_specs))] | ||
| attempt = 1 | ||
| while attempt < max_attempts: |
There was a problem hiding this comment.
if these actors don't fit on a same node the first time, why would they fit when we try the second time?
this is a case where we need PACK scheduling it seems.
There was a problem hiding this comment.
We do use "PACK" strategy by default, so this should be ok. But it's a good question: Could still be that ray places the actor on a different node (bundle), no? And then we have to try again. I would love to use a ray tool to force placing an actor on a given node, but I don't think this exists.
There was a problem hiding this comment.
yeah, I would love to show core folks this as an example use case when this PR is in.
rllib/execution/rollout_ops.py
Outdated
| assert len(remote_kwargs) == len(actors) | ||
|
|
||
| # Create a map inside Trainer instance that maps actorss to sets of open | ||
| # requests (object refs). This way, we keep track, of which actorss have |
There was a problem hiding this comment.
are you intentionally writing actorss here and above?
or are they typos?
There was a problem hiding this comment.
:D definitely typos
rllib/execution/rollout_ops.py
Outdated
| # already been sent how many requests | ||
| # (`max_remote_requests_in_flight_per_actor` arg). | ||
| if not hasattr(trainer, "_remote_requests_in_flight"): | ||
| trainer._remote_requests_in_flight = defaultdict(set) |
There was a problem hiding this comment.
initialize this in trainer's init() or setup()?
it is confusing if some other instances are creating _ private members of another class instance.
also, instead of passing entire trainer, why not pass _remote_requests_in_flights dict into this function? so this rollout op doesn't have access to everything that we have.
There was a problem hiding this comment.
+1 Makes all sense. I'll clean up.
There was a problem hiding this comment.
done
This is only needed in the next PR (where we introduce the AlphaStar agent). In there, I'll set this property up in setup(). 👍
rllib/execution/rollout_ops.py
Outdated
|
|
||
| # Return None if nothing ready after the timeout. | ||
| if not ready: | ||
| return |
There was a problem hiding this comment.
should we return [] or at least None?
There was a problem hiding this comment.
return is the same as return None, no? I can add the None to make it more explicit.
…ntralized_multi_agent_learning_01
gjoliver
left a comment
There was a problem hiding this comment.
thanks for all the updates, looks great now.
one more minor suggestion to delete the util func that is not used anymore.
also thanks for digging into the logics for colocated actor scheduling. it's not pretty for sure.
| # Whether all shards of the replay buffer must be co-located | ||
| # with the learner process (running the execution plan). | ||
| # If False, replay shards may be created on different node(s). | ||
| "replay_buffer_shards_colocated_with_driver": True, |
| co_located, non_co_located = split_colocated(actors, node=node) | ||
| logger.info("Got {} colocated actors of {}".format(len(co_located), count)) | ||
| for a in non_co_located: | ||
| a.__ray_terminate__.remote() |
There was a problem hiding this comment.
yeah I double checked with clark, they have an open issue to make ray_terminate a public api.
|
|
||
|
|
||
| @Deprecated(error=False) | ||
| def drop_colocated(actors: List[ActorHandle]) -> List[ActorHandle]: |
There was a problem hiding this comment.
delete this util? it's not used anywhere.
There was a problem hiding this comment.
Maybe some user is using it somewhere :D
I marked it @Deprecated.
Decentralized multi-agent learning; preparatory PR.
synchronous_parallel_samplingalready used by PGTrainer.create_colocatedvia new and more genericcreate_colocated_actorsutility function. This allows users to co-locate any (different) types of actors on the same node.Why are these changes needed?
Related issue number
Checks
scripts/format.shto lint the changes in this PR.