[Core] Vendor setproctitle#53471
Conversation
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
python/setup.py
Outdated
| ) | ||
| # Vendor setproctitle which is a C extension by | ||
| # copying the so file to the ray/private/thirdparty/ folder. | ||
| subprocess.check_call( |
There was a problem hiding this comment.
Actually why not just declare setproctitle as a default dependency of Ray? What's the drawback of it? https://discuss.python.org/t/can-vendoring-dependencies-in-a-build-be-officially-supported
Case 1: I maintain a package which is a building-block for many of our internal applications and some of our customers’ applications, so maximal compatibility is a goal. This leads to us not wanting to use many libraries, as there are conflicts this would or could introduce with the downstream applications. e.g. I don’t want to use jsonschema because we have applications which use specific versions of it.
That makes sense, but unless you depend on packages with very constrained dependencies, or you yourself depend on a very constrained version of that package, it should not be an issue.
The example you gave, jsonschema, does not have any runtime dependencies, so unless you need a very constrained version of it, depending on it should not really cause any issues.
setproctitle doesn't have dependencies so dpending on it should be just fine?
There was a problem hiding this comment.
Even if I don’t need a constrained version of it today, any time it does a breaking change (major release if you use semver), I need to consider it. jsonschema recently released 4.0, so using it would put us in the position of having to support jsonschema 3.x and 4.x in the same codebase.
well, this can be a reason for vendoring.
There was a problem hiding this comment.
The main reason is that, if one of those dependencies has a security
vulnerability, a new or patched version of that dependency needs to
be re-vendored and a new version of the “immutable artifact”
published/installed even if your actual project has no new changes
at all. SBoM efforts go some way toward alleviating this concern,
but it’s still an added challenge and possible delay for the end
user who is otherwise left with a steaming pile of compromised
systems. Multiply it by all of the different things you’ve installed
each of which contains vendored dependencies and the situation can
quickly become unmanageable.
This is the reason for not vendoring.
There was a problem hiding this comment.
Less vendoring is better in my opinion. conda-forge has been quite successful in patching ray to declare a regular dependency without vendoring.
python/setup.py
Outdated
| "-m", | ||
| "pip", | ||
| "install", | ||
| "-q", |
python/setup.py
Outdated
| # Vendor setproctitle which is a C extension by | ||
| # copying the so file to the ray/private/thirdparty/ folder. |
There was a problem hiding this comment.
consider put it under not os.getenv("SKIP_THIRDPARTY_INSTALL_CONDA_FORGE"): ? conda forge does not really like these kind of vendoring afaik.
cc @mattip
we really just use setproctitle and getproctitle these two functions. we can put a stub file that redirects to
you can put a stub or wrapper file that redirects to either ray._private.thirdparty or external setproctitle
|
The north star for Ray Core ( The ideal would be to just set the process title in the C++ code / integrate it properly into the build system, is that an option people are considering for this? |
This test now fails because how |
|
hmm.. do we want ray users to rely on |
|
feels that ray core should provide an api of because or we cannot vendor |
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
| @@ -1,7 +1,7 @@ | |||
| """ | |||
| This script ensures python files conform to ray's import ordering rules. | |||
| In particular, we make sure psutil and setproctitle is imported _after_ | |||
There was a problem hiding this comment.
No need to enforce the import order for setproctitle since we vendor it and doesn't rely on sys.path anymore.
| @ray.remote | ||
| def f(x): | ||
| assert setproctitle.getproctitle() == "ray::special_f" | ||
| assert psutil.Process().cmdline()[0] == "ray::special_f" |
There was a problem hiding this comment.
This is better checking since it actually makes sure the process title is actually changed underneath while getproctitle() just returns the cached title previously set by setproctitle(). In other words, the previous check only checks that setproctitle() was called without checking if the actual process title was changed or not.
| class Foo: | ||
| def method(self, name): | ||
| assert setproctitle.getproctitle() == f"ray::{name}" | ||
| assert psutil.Process().cmdline()[0] == f"ray::{name}" |
There was a problem hiding this comment.
Same reason as above, checking psutil.Process().cmdline() is better than getproctitle()
src/ray/thirdparty/BUILD.bazel
Outdated
| ray_cc_library( | ||
| name = "setproctitle", | ||
| srcs = glob(["setproctitle/spt*.c"]) + select({ | ||
| "@platforms//os:macos": ["setproctitle/darwin_set_process_name.c"], | ||
| "//conditions:default": [], | ||
| }), | ||
| hdrs = glob(["setproctitle/spt*.h"]) + ["setproctitle/c.h"] + select({ | ||
| "@platforms//os:macos": ["setproctitle/darwin_set_process_name.h"], | ||
| "//conditions:default": [], | ||
| }), | ||
| deps = ["@local_config_python//:python_headers"], | ||
| local_defines = select({ | ||
| "@platforms//os:linux": ["HAVE_SYS_PRCTL_H"], | ||
| "@platforms//os:macos": ["__darwin__"], | ||
| "//conditions:default": [], | ||
| }), | ||
| ) |
There was a problem hiding this comment.
This is basically what setproctitle.setup.py does.
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
**Problem** All versions of setproctitle after 1.2.3 seem to introduce significant expense when forking. This in turn results in very slow or crashing runs of ray jobs on macOs, particularly when spawning many jobs. **Solution** Looking at the code, it seems as if versions after 1.2.3 make quite a lot of changes on Darwin in order to make the process renames work properly with Activity monitor and other MacOs utilities. This... is not 'especially' important to Ray. So downgrading doesn't cost us too much. Though of course it would have been vastly preferable to rely on a non-vendored latest version, but latest versions seem to have this issue. So we can downgrade for now and come back later potentially. ## Related issues fixes #59663 **Historic Context** #53471 vendored the dependency and made a slight logic tweak in the cython binding in setproctitle.pxi. This had the benefit of fixing the cmdline parse issue described in the PR but had the downside of upgrading the library version (which now included a set of Darwin tweaks which leads to the slowdown). After this PR, the state will be that the vendored version is now old enough to not contain the activity monitor tweaks for Darwin, as well as having the changes in setproctitle.pxi. Signed-off-by: ZacAttack <zac@anyscale.com>
**Problem** All versions of setproctitle after 1.2.3 seem to introduce significant expense when forking. This in turn results in very slow or crashing runs of ray jobs on macOs, particularly when spawning many jobs. **Solution** Looking at the code, it seems as if versions after 1.2.3 make quite a lot of changes on Darwin in order to make the process renames work properly with Activity monitor and other MacOs utilities. This... is not 'especially' important to Ray. So downgrading doesn't cost us too much. Though of course it would have been vastly preferable to rely on a non-vendored latest version, but latest versions seem to have this issue. So we can downgrade for now and come back later potentially. ## Related issues fixes ray-project#59663 **Historic Context** ray-project#53471 vendored the dependency and made a slight logic tweak in the cython binding in setproctitle.pxi. This had the benefit of fixing the cmdline parse issue described in the PR but had the downside of upgrading the library version (which now included a set of Darwin tweaks which leads to the slowdown). After this PR, the state will be that the vendored version is now old enough to not contain the activity monitor tweaks for Darwin, as well as having the changes in setproctitle.pxi. Signed-off-by: ZacAttack <zac@anyscale.com> Signed-off-by: jeffery4011 <jefferyshen1015@gmail.com>
**Problem** All versions of setproctitle after 1.2.3 seem to introduce significant expense when forking. This in turn results in very slow or crashing runs of ray jobs on macOs, particularly when spawning many jobs. **Solution** Looking at the code, it seems as if versions after 1.2.3 make quite a lot of changes on Darwin in order to make the process renames work properly with Activity monitor and other MacOs utilities. This... is not 'especially' important to Ray. So downgrading doesn't cost us too much. Though of course it would have been vastly preferable to rely on a non-vendored latest version, but latest versions seem to have this issue. So we can downgrade for now and come back later potentially. ## Related issues fixes ray-project#59663 **Historic Context** ray-project#53471 vendored the dependency and made a slight logic tweak in the cython binding in setproctitle.pxi. This had the benefit of fixing the cmdline parse issue described in the PR but had the downside of upgrading the library version (which now included a set of Darwin tweaks which leads to the slowdown). After this PR, the state will be that the vendored version is now old enough to not contain the activity monitor tweaks for Darwin, as well as having the changes in setproctitle.pxi. Signed-off-by: ZacAttack <zac@anyscale.com>
**Problem** All versions of setproctitle after 1.2.3 seem to introduce significant expense when forking. This in turn results in very slow or crashing runs of ray jobs on macOs, particularly when spawning many jobs. **Solution** Looking at the code, it seems as if versions after 1.2.3 make quite a lot of changes on Darwin in order to make the process renames work properly with Activity monitor and other MacOs utilities. This... is not 'especially' important to Ray. So downgrading doesn't cost us too much. Though of course it would have been vastly preferable to rely on a non-vendored latest version, but latest versions seem to have this issue. So we can downgrade for now and come back later potentially. ## Related issues fixes ray-project#59663 **Historic Context** ray-project#53471 vendored the dependency and made a slight logic tweak in the cython binding in setproctitle.pxi. This had the benefit of fixing the cmdline parse issue described in the PR but had the downside of upgrading the library version (which now included a set of Darwin tweaks which leads to the slowdown). After this PR, the state will be that the vendored version is now old enough to not contain the activity monitor tweaks for Darwin, as well as having the changes in setproctitle.pxi. Signed-off-by: ZacAttack <zac@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
**Problem** All versions of setproctitle after 1.2.3 seem to introduce significant expense when forking. This in turn results in very slow or crashing runs of ray jobs on macOs, particularly when spawning many jobs. **Solution** Looking at the code, it seems as if versions after 1.2.3 make quite a lot of changes on Darwin in order to make the process renames work properly with Activity monitor and other MacOs utilities. This... is not 'especially' important to Ray. So downgrading doesn't cost us too much. Though of course it would have been vastly preferable to rely on a non-vendored latest version, but latest versions seem to have this issue. So we can downgrade for now and come back later potentially. ## Related issues fixes ray-project#59663 **Historic Context** ray-project#53471 vendored the dependency and made a slight logic tweak in the cython binding in setproctitle.pxi. This had the benefit of fixing the cmdline parse issue described in the PR but had the downside of upgrading the library version (which now included a set of Darwin tweaks which leads to the slowdown). After this PR, the state will be that the vendored version is now old enough to not contain the activity monitor tweaks for Darwin, as well as having the changes in setproctitle.pxi. Signed-off-by: ZacAttack <zac@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Why are these changes needed?
Vendor
setproctitleby copying its C source code and expose it to Python via Cython (the original C source code is exposed via Python C extension).The original
setproctitlehas a side effect when you import it due to dvarrazzo/py-setproctitle#114 (it will changepsutil.Process().cmdline()from ["python", "script.py", "--flag"] to ["python script.py --flag", "", ""]) and this PR avoids that by only initializing and changing process title whensetproctitle()is called.Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.