[runtime env] runtime env inheritance refactor by SongGuyang · Pull Request #22244 · ray-project/ray

SongGuyang · 2022-02-09T10:14:32Z

Why are these changes needed?

Runtime Environments is already GA in Ray 1.6.0. The latest doc is here. And now, we already supported a inheritance behavior as follows (copied from the doc):

The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.

We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.

Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:

If there is no runtime env option when we create actor, inherit the parent runtime env.
Otherwise, use the optional runtime env directly and don't do the merging.

Add a new API named ray.runtime_env.get_current_runtime_env() to get the parent runtime env and modify this dict by yourself. Like:
Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))
This new API also can be used in ray client.

Related issue number

#21818

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

SongGuyang · 2022-02-15T04:05:05Z

@architkulkarni @edoakes @rkooo567 Current PR is ready to review.

rkooo567 · 2022-02-15T08:20:41Z

Please update the description!

SongGuyang · 2022-02-15T08:54:10Z

Please update the description!

Done!

architkulkarni

Looks great, thanks for this PR! Just had a few minor questions.

architkulkarni · 2022-02-15T20:02:00Z

doc/source/ray-core/handling-dependencies.rst

-  {"pip": ["torch", "ray[serve]"],
-  "env_vars": {"B": "new", "C", "c"}}
+  # Child updates `runtime_env`
+  Actor.options(runtime_env=ray.get_current_runtime_env().update({"env_vars": {"A": "a", "B": "b"}}))


Thanks for adding this example! I think we should add a full entry for get_current_runtime_env in an API reference somewhere. It could be on this page, or in the API reference for https://docs.ray.io/en/latest/package-ref.html#runtime-context-apis, not sure which is best.

Thanks for the advise. I have added a API doc:

And add the link ref here:

architkulkarni · 2022-02-15T20:04:05Z

python/ray/tests/test_client.py

    import ray

-    if use_client:
+    if not use_client:


Good catch :)

architkulkarni · 2022-02-15T20:10:49Z

python/ray/util/client/api.py

        return ClientWorkerPropertyAPI(self.worker).build_runtime_context()

+    def get_current_runtime_env(self):
+        """Get the runtime env of the current client/driver.


How does this differ from the get_current_runtime_env in runtime_env.py? Does the other one return the runtime env of the current task/actor/job, while this one only returns the runtime env installed by the Ray client server (the job-level runtime env)?

Yes, I think you are right!

architkulkarni · 2022-02-15T20:15:52Z

src/ray/core_worker/core_worker.cc

+    RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "
+                     << parent_serialized_runtime_env << ". It will be overrode by "
+                     << serialized_runtime_env << ".";


Suggested change

RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "

<< parent_serialized_runtime_env << ". It will be overrode by "

<< serialized_runtime_env << ".";

RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "

<< parent_serialized_runtime_env << ". It will be overridden by "

<< serialized_runtime_env << ".";

Why is this a WARNING? Isn't it expected normal behavior?

I add this WARNING log because current PR already changed the API behavior. It is from the advise of Edward #21818 (comment). Maybe we can change this log level After a period of time? I can add a TODO here.

Ah makes sense! Yeah, we'll need some way of remembering to change the log level in the future. A TODO sounds like a good solution.

Actually, when you merge this PR could you make a quick issue for this and add it to the runtime_env backlog milestone? I think that will help prevent it from being lost.

rkooo567

I think it is pretty close to merge. Just a few comments about documentation .

rkooo567 · 2022-02-16T14:47:57Z

doc/source/ray-core/handling-dependencies.rst

@@ -349,27 +349,21 @@ Inheritance

 The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden.


Is this sentence still applicable?

Yes, actor/tasks with unspecified runtime envs will still inherit the env of their parent. Maybe we can clarify by changing "unless it is overriden." -> "unless it is overridden by explicitly specifying a runtime environment for the child task/actor."

rkooo567 · 2022-02-16T14:50:19Z

doc/source/ray-core/handling-dependencies.rst

 The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden.

-If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task) as follows:
+If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task).


This part of the doc is confusing. Can we rewrite in this format?

By default, all actors and tasks inherit the parent's runtime_env. -- here, show an example code -- However, if you specify runtime_env for task/actor, it will override the parents' runtime env -- show an example -- If you'd like to still use parent's runtime environment, -- show an example --

This seems a lot easier to understand!

Btw, I don't like much about parent's runtime env. Is it possible to just show with ray.init as an example? Or do we explain the meaning of "parent" before this section?

Add ray.init and rename to "current runtime"

rkooo567 · 2022-02-16T14:51:40Z

python/ray/runtime_env.py

+    if _runtime_env is None:
+        _runtime_env = dict(ray.get_runtime_context().runtime_env)
+
+    return _runtime_env


Why do we need this global thing? Can't we just return dict(ray.get_runtime_context().runtime_env)?

It's just a little optimization as runtime context. Current runtime env is immutable.

What's the optimization? Can we just not do this? I want to avoid doing unnecessary optimization

rkooo567 · 2022-02-16T14:52:07Z

python/ray/tests/test_runtime_env.py

+    ["ray start --head --ray-client-server-port 25553"],
+    indirect=True,
+)
+@pytest.mark.parametrize("use_client", [False, True])


Do we have tests to verify the inheritance now?

test_e2e_complex can cover this.

rkooo567 · 2022-02-16T14:55:25Z

src/ray/core_worker/core_worker.cc

+std::string CoreWorker::OverrideTaskOrActorRuntimeEnv(
+    const std::string &serialized_runtime_env,
+    std::vector<std::string> *runtime_env_uris) {
+  std::shared_ptr<rpc::RuntimeEnv> parent_runtime_env;


Why is it a shared pointer? Can we just make it rpc::RuntimeEnv? Also, this API seems dangerous (worker_context_.GetCurrentRuntimeEnv()). The return value should be immutable, but now we can mutate it.

Try to avoid object copy.
I can modify the return type to std::shared_ptr<const rpc::RuntimeEnv>, thanks.

modified return type sounds good to me!

rkooo567 · 2022-02-16T14:57:16Z

src/ray/core_worker/core_worker.cc

+    // TODO(SongGuyang): We add this warning log because of the change of API behavior.
+    // Refer to https://github.com/ray-project/ray/issues/21818.
+    // Modify this log level to `INFO` or `DEBUG` after a few release versions.
+    RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is "


This warning is not user-facing, so I am not sure if this will do anything... Maybe just removing it?

What do you mean by not user-facing? Do you mean it won't be streamed to the driver and will only show up in the log files?

Yes! This log will just spam core log files and won’t be delivered to users in a clear way

I see, thanks! Is there a recommended way to stream a warning to the driver from C++?

We can use log level error to do that. But i feel like it is not a good idea cuz it can become extremely spammy

Yep, the WARNING log can't be streamed to driver in my test. Maybe we can keep this log for debug?

Debug sounds good

rkooo567

I am fine with merging the PR, but I'd like to avoid having global stuff.

rkooo567 · 2022-02-19T01:09:59Z

Lmk when it is ready to merge!

SongGuyang · 2022-02-21T08:10:13Z

@rkooo567 Ready to merge!

edoakes · 2022-02-23T16:05:03Z

Hey @SongGuyang I just noticed this while reviewing the other PR, but why do we need to add a new get_current_runtime_env function instead of just using the get_runtime_context? I think we should remove this and only support getting it through runtime context to standardize the API.

SongGuyang · 2022-02-23T17:15:11Z

@edoakes We will support a strong-typed API for runtime env and we will add new APIs to "ray.runtime_env" anyway. So, I want to unify all the APIs about runtime env to one code path. In addition, we also need to add a new API like get_current_runtime_env_class(not sure if it's the best name) for the Strong-typed one. This is my thought. But if you insist on using runtime context, I will move this.

…2244)" This reverts commit 5783cdb.

) Breaks train_torch_linear_test.py.

Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.

…2244)" (ray-project#22626) Breaks train_torch_linear_test.py.

Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.

* [runtime env] runtime env inheritance refactor (#22244) Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.

runtime env inheritance refactor

1eed5e3

SongGuyang requested a review from raulchen February 9, 2022 10:14

add test

e4de00c

SongGuyang changed the title ~~[WIP][runtime env] runtime env inheritance refactor~~ [runtime env] runtime env inheritance refactor Feb 9, 2022

SongGuyang assigned architkulkarni and edoakes Feb 9, 2022

rkooo567 self-assigned this Feb 9, 2022

SongGuyang changed the title ~~[runtime env] runtime env inheritance refactor~~ [WIP][runtime env] runtime env inheritance refactor Feb 9, 2022

SongGuyang added 3 commits February 14, 2022 18:24

support client api

8b365b0

add new file

26a8dec

add warning log

8c70646

SongGuyang changed the title ~~[WIP][runtime env] runtime env inheritance refactor~~ [runtime env] runtime env inheritance refactor Feb 15, 2022

SongGuyang added 2 commits February 15, 2022 13:10

fix lint

8e7544c

fix test

4f3bf19

architkulkarni approved these changes Feb 15, 2022

View reviewed changes

SongGuyang added 2 commits February 16, 2022 17:58

address comments

f5a809f

add doc

f9b3711

rkooo567 reviewed Feb 16, 2022

View reviewed changes

address comments

b684922

rkooo567 approved these changes Feb 18, 2022

View reviewed changes

address comments

68c343f

Merge branch 'master' into dev_runtime_env_inherit

8ee7906

raulchen merged commit 5783cdb into ray-project:master Feb 21, 2022

raulchen deleted the dev_runtime_env_inherit branch February 21, 2022 10:13

architkulkarni mentioned this pull request Feb 23, 2022

[runtime env] Add test that both APIs for getting the current runtime env agree #22602

Closed

6 tasks

xwjiang2010 added a commit to xwjiang2010/ray that referenced this pull request Feb 24, 2022

Revert "[runtime env] runtime env inheritance refactor (ray-project#2…

1bc04b4

…2244)" This reverts commit 5783cdb.

matthewdeng mentioned this pull request Feb 24, 2022

[release test] train_pytorch_linear_test hangs since Monday 2/21 #22595

Closed

2 tasks

edoakes pushed a commit that referenced this pull request Feb 25, 2022

Revert "[runtime env] runtime env inheritance refactor (#22244)" (#22626

d4a1bc7

) Breaks train_torch_linear_test.py.

simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022

Revert "[runtime env] runtime env inheritance refactor (ray-project#2…

ebce453

…2244)" (ray-project#22626) Breaks train_torch_linear_test.py.

SongGuyang mentioned this pull request May 6, 2022

[runtime env] runtime env inheritance refactor #24538

Merged

		@@ -349,27 +349,21 @@ Inheritance

		The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden.

Conversation

SongGuyang commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

SongGuyang commented Feb 15, 2022

Uh oh!

rkooo567 commented Feb 15, 2022

Uh oh!

SongGuyang commented Feb 15, 2022

Uh oh!

architkulkarni left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rkooo567 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

SongGuyang commented Feb 9, 2022 •

edited

Loading