Skip to content

[runtime env] runtime env inheritance refactor#24538

Merged
raulchen merged 14 commits intoray-project:masterfrom
antgroup:dev_runtime_env_inherit
May 20, 2022
Merged

[runtime env] runtime env inheritance refactor#24538
raulchen merged 14 commits intoray-project:masterfrom
antgroup:dev_runtime_env_inherit

Conversation

@SongGuyang
Copy link
Copy Markdown
Contributor

@SongGuyang SongGuyang commented May 6, 2022

Why are these changes needed?

Recreate for the reverted PR for 2.0.
#22244
#22620

Runtime Environments is already GA in Ray 1.6.0. The latest doc is here. And now, we already supported a inheritance behavior as follows (copied from the doc):

  • The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
  • Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.

We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.

Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:

  • If there is no runtime env option when we create actor, inherit the parent runtime env.
  • Otherwise, use the optional runtime env directly and don't do the merging.

You could get the parent runtime env and modify this dict by yourself. Like:
Actor.options(runtime_env=ray.get_runtime_context().runtime_env.update({"X": "Y"}))

Related issue number

#21818

SongGuyang added 5 commits May 6, 2022 15:53
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc):
- The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
- Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.

We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.

Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
- **If there is no runtime env option when we create actor, inherit the parent runtime env.**
- **Otherwise, use the optional runtime env directly and don't do the merging.**

Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like:
```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))```
This new API also can be used in ray client.
@SongGuyang SongGuyang requested a review from edoakes May 6, 2022 08:10
@SongGuyang SongGuyang changed the title [runtime env] URI reference refactor [runtime env] runtime env inheritance refactor May 6, 2022
@SongGuyang
Copy link
Copy Markdown
Contributor Author

Hey @xwjiang2010, we will make this change again. Do you already have one UT for the issue #22595?

@SongGuyang
Copy link
Copy Markdown
Contributor Author

@edoakes @rkooo567 @architkulkarni please help to check if we can make this change for 2.0.

@rkooo567
Copy link
Copy Markdown
Contributor

rkooo567 commented May 6, 2022

cc @edoakes Sorry that i forgot some context. did it pass the API review btw? Should we add an api-review-required tag?

@architkulkarni
Copy link
Copy Markdown
Contributor

cc @edoakes Sorry that i forgot some context. did it pass the API review btw? Should we add an api-review-required tag?

This was the API review issue: #21818 There wasn't any pushback. We also asked for opinions on OSS Slack and I think I remember there were a few thumbs up and no pushback (unfortunately I think the thread is lost because we have limited history.)

Copy link
Copy Markdown
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two PRs mentioned were already approved, so this PR should be approved too. It includes the warning message for users in the case where the behavior is different. The Ray 1.13 branch cut has already happened, so if I understand correctly, it's safe to merge this now and this will only appear in Ray 2.0.

I'll try to run train_pytorch_linear_test directly on this PR, since now we can run release tests for PRs on buildkite. (running here)

Copy link
Copy Markdown
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to @zhe-thoughts we should wait for the branch-1 to be cut before merging breaking changes, so holding for that.

@rkooo567
Copy link
Copy Markdown
Contributor

Btw, cc @jjyao this will be one of API changes from 2.0. We should probably track it?

Copy link
Copy Markdown
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR assuming I already approved the same PR before!

}
}

public void testPerActorEnvVars() {
Copy link
Copy Markdown
Contributor

@jovany-wang jovany-wang May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to there is no case that one process holds multiple workers, it could be simplified to:

  public void testPerActorEnvVars() {
    try {
      Ray.init();

      {
        RuntimeEnv runtimeEnv =
            new RuntimeEnv.Builder()
                .addEnvVar("KEY1", "A")
                .addEnvVar("KEY2", "B")
                .addEnvVar("KEY1", "C")
                .build();

        ActorHandle<A> actor1 = Ray.actor(A::new).setRuntimeEnv(runtimeEnv).remote();
        String val = actor1.task(A::getEnv, "KEY1").remote().get();
        Assert.assertEquals(val, "C");
        val = actor1.task(A::getEnv, "KEY2").remote().get();
        Assert.assertEquals(val, "B");
      }

      {
        /// Because we didn't set them for actor2 , all should be null.
        ActorHandle<A> actor2 = Ray.actor(A::new).remote();
        String val = actor2.task(A::getEnv, "KEY1").remote().get();
        Assert.assertNull(val);
        val = actor2.task(A::getEnv, "KEY2").remote().get();
        Assert.assertNull(val);
      }

    } finally {
      Ray.shutdown();
    }
  }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this PR after addressing this? @jovany-wang we can just change the code this way here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change had been committed at 867ef68

So feel free to merge this PR

@SongGuyang
Copy link
Copy Markdown
Contributor Author

We will merge this PR because the "ray-1.x" branch has been cut.

@raulchen raulchen merged commit eb2692c into ray-project:master May 20, 2022
@raulchen raulchen deleted the dev_runtime_env_inherit branch May 20, 2022 02:53
@ericl
Copy link
Copy Markdown
Contributor

ericl commented Jun 4, 2022

This PR has caused a P0 regression in master for Tune: #25484

Thinking about the new behavior, it's also fundamentally inconsistent with how Jobs works. When the user specifies a working_dir implicitly via the jobs API, it's very surprising that overriding env_vars also deletes the working dir. The user might not even know about the working dir concept if they're using only jobs.

I think we need to change this to either:

  1. Raise an exception if any fields are dropped (such as working_dir), or
  2. Revert to the previous behavior.

Basically, the user should have to explicitly set {"working_dir": None} to unset the working_dir. Maybe we can also provide a utility to make it easier to generate an updated runtime_env, such as ray.get_runtime_env().with_updates({"env_vars": ...}).

ericl added a commit to ericl/ray that referenced this pull request Jun 5, 2022
ericl added a commit that referenced this pull request Jun 5, 2022
…)" (#25487)

This reverts commit eb2692c.

This is a temporary mitigation for #25484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants