[runtime env] runtime env inheritance refactor#22244
[runtime env] runtime env inheritance refactor#22244raulchen merged 12 commits intoray-project:masterfrom
Conversation
|
@architkulkarni @edoakes @rkooo567 Current PR is ready to review. |
|
Please update the description! |
Done! |
architkulkarni
left a comment
There was a problem hiding this comment.
Looks great, thanks for this PR! Just had a few minor questions.
| {"pip": ["torch", "ray[serve]"], | ||
| "env_vars": {"B": "new", "C", "c"}} | ||
| # Child updates `runtime_env` | ||
| Actor.options(runtime_env=ray.get_current_runtime_env().update({"env_vars": {"A": "a", "B": "b"}})) |
There was a problem hiding this comment.
Thanks for adding this example! I think we should add a full entry for get_current_runtime_env in an API reference somewhere. It could be on this page, or in the API reference for https://docs.ray.io/en/latest/package-ref.html#runtime-context-apis, not sure which is best.
| import ray | ||
|
|
||
| if use_client: | ||
| if not use_client: |
| return ClientWorkerPropertyAPI(self.worker).build_runtime_context() | ||
|
|
||
| def get_current_runtime_env(self): | ||
| """Get the runtime env of the current client/driver. |
There was a problem hiding this comment.
How does this differ from the get_current_runtime_env in runtime_env.py? Does the other one return the runtime env of the current task/actor/job, while this one only returns the runtime env installed by the Ray client server (the job-level runtime env)?
There was a problem hiding this comment.
Yes, I think you are right!
| RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " | ||
| << parent_serialized_runtime_env << ". It will be overrode by " | ||
| << serialized_runtime_env << "."; |
There was a problem hiding this comment.
| RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " | |
| << parent_serialized_runtime_env << ". It will be overrode by " | |
| << serialized_runtime_env << "."; | |
| RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " | |
| << parent_serialized_runtime_env << ". It will be overridden by " | |
| << serialized_runtime_env << "."; |
Why is this a WARNING? Isn't it expected normal behavior?
There was a problem hiding this comment.
I add this WARNING log because current PR already changed the API behavior. It is from the advise of Edward #21818 (comment). Maybe we can change this log level After a period of time? I can add a TODO here.
There was a problem hiding this comment.
Ah makes sense! Yeah, we'll need some way of remembering to change the log level in the future. A TODO sounds like a good solution.
There was a problem hiding this comment.
Actually, when you merge this PR could you make a quick issue for this and add it to the runtime_env backlog milestone? I think that will help prevent it from being lost.
rkooo567
left a comment
There was a problem hiding this comment.
I think it is pretty close to merge. Just a few comments about documentation .
| @@ -349,27 +349,21 @@ Inheritance | |||
|
|
|||
| The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden. | |||
There was a problem hiding this comment.
Is this sentence still applicable?
There was a problem hiding this comment.
Yes, actor/tasks with unspecified runtime envs will still inherit the env of their parent. Maybe we can clarify by changing "unless it is overriden." -> "unless it is overridden by explicitly specifying a runtime environment for the child task/actor."
| The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden. | ||
|
|
||
| If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task) as follows: | ||
| If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task). |
There was a problem hiding this comment.
This part of the doc is confusing. Can we rewrite in this format?
By default, all actors and tasks inherit the parent's runtime_env.
-- here, show an example code --
However, if you specify runtime_env for task/actor, it will override the parents' runtime env
-- show an example --
If you'd like to still use parent's runtime environment,
-- show an example --
There was a problem hiding this comment.
This seems a lot easier to understand!
Btw, I don't like much about parent's runtime env. Is it possible to just show with ray.init as an example? Or do we explain the meaning of "parent" before this section?
There was a problem hiding this comment.
Add ray.init and rename to "current runtime"
python/ray/runtime_env.py
Outdated
| if _runtime_env is None: | ||
| _runtime_env = dict(ray.get_runtime_context().runtime_env) | ||
|
|
||
| return _runtime_env |
There was a problem hiding this comment.
Why do we need this global thing? Can't we just return dict(ray.get_runtime_context().runtime_env)?
There was a problem hiding this comment.
It's just a little optimization as runtime context. Current runtime env is immutable.
There was a problem hiding this comment.
What's the optimization? Can we just not do this? I want to avoid doing unnecessary optimization
| ["ray start --head --ray-client-server-port 25553"], | ||
| indirect=True, | ||
| ) | ||
| @pytest.mark.parametrize("use_client", [False, True]) |
There was a problem hiding this comment.
Do we have tests to verify the inheritance now?
There was a problem hiding this comment.
test_e2e_complex can cover this.
src/ray/core_worker/core_worker.cc
Outdated
| std::string CoreWorker::OverrideTaskOrActorRuntimeEnv( | ||
| const std::string &serialized_runtime_env, | ||
| std::vector<std::string> *runtime_env_uris) { | ||
| std::shared_ptr<rpc::RuntimeEnv> parent_runtime_env; |
There was a problem hiding this comment.
Why is it a shared pointer? Can we just make it rpc::RuntimeEnv? Also, this API seems dangerous (worker_context_.GetCurrentRuntimeEnv()). The return value should be immutable, but now we can mutate it.
There was a problem hiding this comment.
Try to avoid object copy.
I can modify the return type to std::shared_ptr<const rpc::RuntimeEnv>, thanks.
There was a problem hiding this comment.
modified return type sounds good to me!
| // TODO(SongGuyang): We add this warning log because of the change of API behavior. | ||
| // Refer to https://github.com/ray-project/ray/issues/21818. | ||
| // Modify this log level to `INFO` or `DEBUG` after a few release versions. | ||
| RAY_LOG(WARNING) << "Runtime env already exists and the parent runtime env is " |
There was a problem hiding this comment.
This warning is not user-facing, so I am not sure if this will do anything... Maybe just removing it?
There was a problem hiding this comment.
What do you mean by not user-facing? Do you mean it won't be streamed to the driver and will only show up in the log files?
There was a problem hiding this comment.
Yes! This log will just spam core log files and won’t be delivered to users in a clear way
There was a problem hiding this comment.
I see, thanks! Is there a recommended way to stream a warning to the driver from C++?
There was a problem hiding this comment.
We can use log level error to do that. But i feel like it is not a good idea cuz it can become extremely spammy
There was a problem hiding this comment.
Yep, the WARNING log can't be streamed to driver in my test. Maybe we can keep this log for debug?
rkooo567
left a comment
There was a problem hiding this comment.
I am fine with merging the PR, but I'd like to avoid having global stuff.
|
Lmk when it is ready to merge! |
|
@rkooo567 Ready to merge! |
|
Hey @SongGuyang I just noticed this while reviewing the other PR, but why do we need to add a new |
|
@edoakes We will support a strong-typed API for runtime env and we will add new APIs to "ray.runtime_env" anyway. So, I want to unify all the APIs about runtime env to one code path. In addition, we also need to add a new API like |
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.
…2244)" (ray-project#22626) Breaks train_torch_linear_test.py.
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.
* [runtime env] runtime env inheritance refactor (#22244) Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.




Why are these changes needed?
Runtime Environments is already GA in Ray 1.6.0. The latest doc is here. And now, we already supported a inheritance behavior as follows (copied from the doc):
We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.
Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
Add a new API named
ray.runtime_env.get_current_runtime_env()to get the parent runtime env and modify this dict by yourself. Like:Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))This new API also can be used in ray client.
Related issue number
#21818
Checks
scripts/format.shto lint the changes in this PR.