[RLlib] New ConnectorV2 API #06: Changes in SingleAgentEpisode & SingleAgentEnvRunner.#42296
Merged
sven1977 merged 14 commits intoray-project:masterfrom Jan 12, 2024
Conversation
sven1977
commented
Jan 10, 2024
| # Close our env object via gymnasium's API. | ||
| self.env.close() | ||
|
|
||
| # TODO (sven): Replace by default "to-env" connector. |
Contributor
Author
There was a problem hiding this comment.
Not necessary anymore here in EnvRunner.
This is default ModuleToEnv connector behavior now.
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
kouroshHakha
approved these changes
Jan 11, 2024
Comment on lines
+164
to
+172
| # TODO (sven): Convert data to proper tensor formats, depending on framework | ||
| # used by the RLModule. We cannot do this right now as the RLModule does NOT | ||
| # know its own device. Only the Learner knows the device. Also, on the | ||
| # EnvRunner side, we assume that it's always the CPU (even though one could | ||
| # imagine a GPU-based EnvRunner + RLModule for sampling). | ||
| # if rl_module.framework == "torch": | ||
| # data = convert_to_torch_tensor(data, device=??) | ||
| # elif rl_module.framework == "tf2": | ||
| # data = |
Contributor
Author
There was a problem hiding this comment.
It's a TODO on an open question with the possible code-solution commented out. I'll leave this in. We need to unify this behavior (numpy to tensor) for all connector types in the near future to not cause user confusion.
The blocker right now is the fact that an RLModule does not know its own device today (only Learners do (GPU or CPU) and EnvRunners assume they are always on the CPU). Thus, connectors have to means to perform this conversion step properly.
| rl_module=self.module, | ||
| episodes=self._episodes, | ||
| explore=explore, | ||
| # persistent_data=None, #TODO |
Contributor
There was a problem hiding this comment.
Is there a TODO here? What is the todo exactly?
| data=to_env, | ||
| episodes=self._episodes, | ||
| explore=explore, | ||
| # persistent_data=None, #TODO |
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
vickytsang
pushed a commit
to ROCm/ray
that referenced
this pull request
Jan 12, 2024
…ode & SingleAgentEnvRunner. (ray-project#42296)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds some changes to SingleAgentEpisode & SingleAgentEnvRunner:
SingleAgentEnvRunner now utilizes the user-configured EnvToModule and ModuleToEnv connector pipelines.
Hence,
SingleAgentEnvRunnerdoes NOT anymore:Add
setAPIs to SingleAgentEpisode, such that custom connectors are able to manipulate an episode's data, e.g. for observation framestacking, reward clipping, etc..New
setAPI had to also be supported then byInfiniteLookbackBuffer, which sits at the core of all episode classes.Updated test cases and added new ones for
setAPIs.Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.