[runtime env] runtime env inheritance refactor#24538
[runtime env] runtime env inheritance refactor#24538raulchen merged 14 commits intoray-project:masterfrom
Conversation
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - **If there is no runtime env option when we create actor, inherit the parent runtime env.** - **Otherwise, use the optional runtime env directly and don't do the merging.** Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.
|
Hey @xwjiang2010, we will make this change again. Do you already have one UT for the issue #22595? |
|
@edoakes @rkooo567 @architkulkarni please help to check if we can make this change for 2.0. |
|
cc @edoakes Sorry that i forgot some context. did it pass the API review btw? Should we add an api-review-required tag? |
This was the API review issue: #21818 There wasn't any pushback. We also asked for opinions on OSS Slack and I think I remember there were a few thumbs up and no pushback (unfortunately I think the thread is lost because we have limited history.) |
There was a problem hiding this comment.
The two PRs mentioned were already approved, so this PR should be approved too. It includes the warning message for users in the case where the behavior is different. The Ray 1.13 branch cut has already happened, so if I understand correctly, it's safe to merge this now and this will only appear in Ray 2.0.
I'll try to run train_pytorch_linear_test directly on this PR, since now we can run release tests for PRs on buildkite. (running here)
edoakes
left a comment
There was a problem hiding this comment.
According to @zhe-thoughts we should wait for the branch-1 to be cut before merging breaking changes, so holding for that.
|
Btw, cc @jjyao this will be one of API changes from 2.0. We should probably track it? |
rkooo567
left a comment
There was a problem hiding this comment.
Approving the PR assuming I already approved the same PR before!
| } | ||
| } | ||
|
|
||
| public void testPerActorEnvVars() { |
There was a problem hiding this comment.
Due to there is no case that one process holds multiple workers, it could be simplified to:
public void testPerActorEnvVars() {
try {
Ray.init();
{
RuntimeEnv runtimeEnv =
new RuntimeEnv.Builder()
.addEnvVar("KEY1", "A")
.addEnvVar("KEY2", "B")
.addEnvVar("KEY1", "C")
.build();
ActorHandle<A> actor1 = Ray.actor(A::new).setRuntimeEnv(runtimeEnv).remote();
String val = actor1.task(A::getEnv, "KEY1").remote().get();
Assert.assertEquals(val, "C");
val = actor1.task(A::getEnv, "KEY2").remote().get();
Assert.assertEquals(val, "B");
}
{
/// Because we didn't set them for actor2 , all should be null.
ActorHandle<A> actor2 = Ray.actor(A::new).remote();
String val = actor2.task(A::getEnv, "KEY1").remote().get();
Assert.assertNull(val);
val = actor2.task(A::getEnv, "KEY2").remote().get();
Assert.assertNull(val);
}
} finally {
Ray.shutdown();
}
}There was a problem hiding this comment.
Let's merge this PR after addressing this? @jovany-wang we can just change the code this way here?
There was a problem hiding this comment.
This change had been committed at 867ef68
So feel free to merge this PR
|
We will merge this PR because the "ray-1.x" branch has been cut. |
|
This PR has caused a P0 regression in master for Tune: #25484 Thinking about the new behavior, it's also fundamentally inconsistent with how Jobs works. When the user specifies a working_dir implicitly via the jobs API, it's very surprising that overriding env_vars also deletes the working dir. The user might not even know about the working dir concept if they're using only jobs. I think we need to change this to either:
Basically, the user should have to explicitly set |
Why are these changes needed?
Recreate for the reverted PR for 2.0.
#22244
#22620
Runtime Environments is already GA in Ray 1.6.0. The latest doc is here. And now, we already supported a inheritance behavior as follows (copied from the doc):
We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.
Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
You could get the parent runtime env and modify this dict by yourself. Like:
Actor.options(runtime_env=ray.get_runtime_context().runtime_env.update({"X": "Y"}))Related issue number
#21818