[WIP] Bulk executor initial implementation#30903
Conversation
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
|
Some progress: now passing a good portion of dataset tests: |
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
| ray_remote_args: Remote arguments for the Ray actors to be created. | ||
| pool_size: The size of the actor pool. | ||
| """ | ||
| if "num_cpus" not in ray_remote_args: |
There was a problem hiding this comment.
@ericl I reverted this, since it looks not compatible with the requirement that "num_gpus and num_cpus cannot be both specified" (user may want to run actors on gpus and then they won't be able tp specify cpus).
|
All CI tests are passing now. However, we seem to have a release test failure that may be relevant: https://buildkite.com/ray-project/release-tests-pr/builds/26265#0185e158-7979-4c94-be61-22da62d208ca |
It seems likely due to the lack of autoscaling for actor pool, as it had just one actor through the entire run, @clarkzinzow And since it's about the same issue as TODO in unit test, shall we note this as a debt to fix soon and merge this PR? @ericl |
|
Alright. Let's leave the TODO to fix this test (or we can fix this test by increasing the min pool size). |
…-project#31283) Add a utility class for tracing object allocation / freeing. This makes it a lot easier to debug memory allocation / freeing issues. This is split out from ray-project#30903 Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
…oject#31305) Add the initial operator implementations. This is split out from ray-project#30903 Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
…oject#31443) Add the basic bulk executor. This is split out from ray-project#30903 Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
Why are these changes needed?
Initial implementation of ray-project/enhancements#18
Original prototype: https://github.com/ray-project/ray/pull/30222/files
TODO:
Merge plan: