[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03#21652
Conversation
avnishn
left a comment
There was a problem hiding this comment.
Minor comments. LMK what you think :)
rllib/agents/ars/ars.py
Outdated
| self.validate_config(config) | ||
|
|
||
| # Generate `self.env_creator` callable to create an env instance. | ||
| self._get_env_creator_from_env_id(self._env_id) |
There was a problem hiding this comment.
for whatever reason this function isn't available on master, but I also didn't find its definition in this pr diff, but only when checking out this branch. Strange :(
There was a problem hiding this comment.
Could be a mistake by me when splitting my local branch (which contained more changes).
rllib/agents/es/es.py
Outdated
| self.validate_config(config) | ||
|
|
||
| # Generate `self.env_creator` callable to create an env instance. | ||
| self._get_env_creator_from_env_id(self._env_id) |
There was a problem hiding this comment.
could we change the name of this function to _set_env_creator_from_env_id. This function sets the self.env_creator variable, but it doesn't return anything. That, or we could return the env creator ourselves and set the attribute ourselves:
self.env_creator = self._get_env_creator_from_env_id(self._env_id)
There was a problem hiding this comment.
thanks for the catch. I'll check. ...
…ntralized_multi_agent_learning_03
…ntralized_multi_agent_learning_03
gjoliver
left a comment
There was a problem hiding this comment.
very nice cleanup PR.
a couple of minor questions.
on a high level, maybe break this into multiple PRs in the future, so simple changes like config param renaming can be merged much faster.
thanks.
| policy_mapping_fn=policy_mapping_fn, | ||
| policies_to_train=policies_to_train, | ||
| ) | ||
| worker.add_policy(**kwargs) |
There was a problem hiding this comment.
now that this is a 1-line statement, why bother with the inline fn?
maybe cleaner to just call worker.add_policy(**kwargs) directly below.
There was a problem hiding this comment.
We need to pass a function/callable to foreach_worker() below anyways. So it's better to share that code.
| if workers: | ||
| workers.stop() | ||
| # Stop all optimizers. | ||
| if hasattr(self, "optimizer") and self.optimizer: |
There was a problem hiding this comment.
Seemed really outdated code.
Trainers do not have self.optimizer (anymore), only ES and ARS and those two optimizers do NOT have a stop() method, so this would actually produce errors here.
If users want their Trainers to have a self.optimizer, they can just override Trainer.cleanup() and implement the necessary logic.
…ntralized_multi_agent_learning_03
…ntralized_multi_agent_learning_03
Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #3
min_time_s_per_reportinginstead ofmin_iter_time_s(deprecated previously).Trainer.setup()instead ofTrainer._init()(deprecated previously) and renameself._workersintoself.workersto match all other algos in RLlib.Why are these changes needed?
Related issue number
Checks
scripts/format.shto lint the changes in this PR.