[data] New executor [7/n]--- bare bones implementation of StreamingExecutor by ericl · Pull Request #31579 · ray-project/ray

ericl · 2023-01-11T00:27:06Z

Why are these changes needed?

Initial implementation of ray-project/enhancements#18, dependent on #30903

Streaming execution can be toggled with the following env var: RAY_DATASET_USE_STREAMING_EXECUTOR=0|1.

Initial PR TODOs:

Add basic tests and demo code

Future TODOs:

Implement stats
Implement resource limits
Handle autoscaling actor pools

Signed-off-by: Eric Liang <ekhliang@gmail.com>

stephanie-wang

Might be missing something, but actually I wonder if we really need the inputs_done interface? I think this is needed for all-to-all operations and to flush at the end of the job, right?

For all-to-all, I think we could move the logic inside the bulk_fn that's passed to the operator. For example, we can call the bulk_fn on each input added, and bulk_fn would only run the shuffle once it's received the expected number of inputs. This also better matches what the operator would look like in the windowed shuffle case.

For the flush case, we can track the done inputs in the op state as we are right now, but then only call op.flush() once per op instead of once per op input.

Fine leaving this cleanup to later but it occurred to me that it may simplify the current code.

python/ray/data/_internal/execution/streaming_executor_state.py

stephanie-wang · 2023-01-12T17:32:37Z

python/ray/data/_internal/execution/streaming_executor_state.py

+    return selected
+
+
+def dispatch_next_task(op_state: OpState) -> None:


Should this just be a method of OpState?

python/ray/data/_internal/execution/streaming_executor_state.py

stephanie-wang · 2023-01-12T17:53:54Z

python/ray/data/_internal/execution/operators/map_operator.py


    def inputs_done(self, input_index: int) -> None:
        self._execution_state.inputs_done(input_index)
+        self._inputs_done = True


Is there an assumption here that there is only one input (add a note?)?

stephanie-wang · 2023-01-12T17:54:16Z

python/ray/data/_internal/execution/operators/all_to_all_operator.py

+        self._completed = True
+
+    def completed(self) -> bool:
+        return self._completed


Why don't we need to check whether self.has_next() as we do in the map operator?

Done (moved to physical op class).

stephanie-wang · 2023-01-12T17:59:11Z

python/ray/data/_internal/execution/operators/map_operator.py


+    def completed(self) -> bool:
+        return (
+            self._inputs_done and len(self.get_work_refs()) == 0 and not self.has_next()


Seems like this definition could be shared across the different operators?

python/ray/data/tests/test_operators.py

stephanie-wang · 2023-01-12T18:10:33Z

python/ray/data/tests/test_streaming_executor.py

+    op_state = OpState(o2)
+    o2.add_input = MagicMock()
+    op_state.inqueues[0].append("dummy1")
+    dispatch_next_task(op_state)


Suggest running this multiple times so we can test indices other than 0.

Added a test for multiple inputs, and a TODO for multiple indices.

python/ray/data/tests/test_streaming_executor.py

ericl

Updated. Regarding removing the completion method, I don't think that's possible if we want to support operators with unknown output size in general. Currently, we have a method that returns an estimate of the number of outputs for the progress bar, but this is allowed to return None for operators that don't know their number of outputs at planning time.

ericl · 2023-01-12T20:08:06Z

python/ray/data/_internal/execution/operators/all_to_all_operator.py

+        self._completed = True
+
+    def completed(self) -> bool:
+        return self._completed


Done (moved to physical op class).

ericl · 2023-01-12T20:08:14Z

python/ray/data/_internal/execution/operators/map_operator.py


    def inputs_done(self, input_index: int) -> None:
        self._execution_state.inputs_done(input_index)
+        self._inputs_done = True


ericl · 2023-01-12T20:08:28Z

python/ray/data/_internal/execution/operators/map_operator.py


+    def completed(self) -> bool:
+        return (
+            self._inputs_done and len(self.get_work_refs()) == 0 and not self.has_next()


python/ray/data/_internal/execution/streaming_executor_state.py

ericl · 2023-01-12T20:10:41Z

python/ray/data/_internal/execution/streaming_executor_state.py

+    return selected
+
+
+def dispatch_next_task(op_state: OpState) -> None:


python/ray/data/tests/test_operators.py

ericl · 2023-01-12T20:14:21Z

python/ray/data/tests/test_streaming_executor.py

+    op_state = OpState(o2)
+    o2.add_input = MagicMock()
+    op_state.inqueues[0].append("dummy1")
+    dispatch_next_task(op_state)


Added a test for multiple inputs, and a TODO for multiple indices.

python/ray/data/tests/test_streaming_executor.py

stephanie-wang · 2023-01-12T20:32:36Z

Updated. Regarding removing the completion method, I don't think that's possible if we want to support operators with unknown output size in general. Currently, we have a method that returns an estimate of the number of outputs for the progress bar, but this is allowed to return None for operators that don't know their number of outputs at planning time.

Hmm I see, yeah I was thinking we would just flush once enough outputs have been accumulated, but I guess it ends up being the same thing.

It's more minor, but maybe we could at least change the signature from inputs_done(self, input_index: int) to inputs_done(self)? I don't think we actually need to track the input indices in the operators as long as we're tracking in the OpState.

python/ray/data/_internal/execution/streaming_executor.py

python/ray/data/_internal/execution/streaming_executor_state.py

ericl · 2023-01-12T20:43:17Z

Yup good point, I can't think of where an operator would need the index for the done signal. Removed.

stephanie-wang

LGTM!

Signed-off-by: Eric Liang <ekhliang@gmail.com>

python/ray/data/_internal/execution/streaming_executor.py

python/ray/data/_internal/execution/interfaces.py

python/ray/data/_internal/execution/streaming_executor_state.py

Signed-off-by: Eric Liang <ekhliang@gmail.com>

clarkzinzow

LGTM overall, mostly questions and impl/comment/test nits.

python/ray/data/_internal/execution/streaming_executor_state.py

python/ray/data/_internal/execution/interfaces.py

python/ray/data/_internal/execution/streaming_executor_state.py

python/ray/data/tests/test_streaming_executor.py

python/ray/data/_internal/execution/streaming_executor_state.py

python/ray/data/tests/test_streaming_executor.py

jianoaix

LGTM

python/ray/data/_internal/execution/streaming_executor_state.py

ericl

All comments addressed, ptal.

Signed-off-by: Eric Liang <ekhliang@gmail.com>

…ecutor (ray-project#31579) Initial implementation of ray-project/enhancements#18, dependent on ray-project#30903 Streaming execution can be toggled with the following env var: RAY_DATASET_USE_STREAMING_EXECUTOR=0|1. Signed-off-by: Andrea Pisoni <andreapiso@gmail.com>

streaming stuff only

761a53a

ericl requested review from c21, clarkzinzow, jianoaix, jjyao and scv119 as code owners January 11, 2023 00:27

add basic streaming

8404b37

ericl force-pushed the streaming-executor branch from e5dca74 to 8404b37 Compare January 11, 2023 00:28

ericl added 2 commits January 10, 2023 17:07

integrate with feature flag

124b9b5

fix stats

78a89c3

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl force-pushed the streaming-executor branch from 309620b to 78a89c3 Compare January 11, 2023 01:12

ericl added 7 commits January 10, 2023 17:16

stats todo

34990e8

Signed-off-by: Eric Liang <ekhliang@gmail.com>

wip refactor

e0c3e04

fix it

9117a85

fix finalization

3cf4a6e

add sanity test

b4147b5

test completed flag

6c5f732

remove demo

11d0a7d

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl changed the title ~~[WIP] Prototype the streaming executor API~~ [data] New executor [7/n]--- bare bones implementation of StreamingExecutor Jan 12, 2023

ericl added 2 commits January 11, 2023 18:07

disable by default

008eb13

Signed-off-by: Eric Liang <ekhliang@gmail.com>

fix legacy call

7c4117b

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl assigned stephanie-wang, c21, clarkzinzow and jianoaix Jan 12, 2023

off

75ed42d

Signed-off-by: Eric Liang <ekhliang@gmail.com>

stephanie-wang reviewed Jan 12, 2023

View reviewed changes

comments 1

8901571

ericl commented Jan 12, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into streaming-executor

02a6043

stephanie-wang reviewed Jan 12, 2023

View reviewed changes

python/ray/data/_internal/execution/streaming_executor.py Outdated Show resolved Hide resolved

python/ray/data/_internal/execution/streaming_executor_state.py Outdated Show resolved Hide resolved

remove index from inputs done

528f87d

ericl force-pushed the streaming-executor branch from c641f76 to 528f87d Compare January 12, 2023 20:45

ericl added 2 commits January 12, 2023 13:15

fix tests

4a36f66

Merge remote-tracking branch 'upstream/master' into streaming-executor

97ed2a0

stephanie-wang approved these changes Jan 13, 2023

View reviewed changes

ericl added 2 commits January 13, 2023 11:31

fix regression

67ae881

Signed-off-by: Eric Liang <ekhliang@gmail.com>

Merge remote-tracking branch 'upstream/master' into streaming-executor

c16d265

jianoaix reviewed Jan 13, 2023

View reviewed changes

remove stale docs

3e954e9

Signed-off-by: Eric Liang <ekhliang@gmail.com>

clarkzinzow reviewed Jan 14, 2023

View reviewed changes

jianoaix reviewed Jan 14, 2023

View reviewed changes

python/ray/data/_internal/execution/streaming_executor_state.py Outdated Show resolved Hide resolved

comments 2

ba8e530

ericl commented Jan 15, 2023

View reviewed changes

remove unnecessary reverse

31cfd08

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl force-pushed the streaming-executor branch from 9da93b8 to 31cfd08 Compare January 15, 2023 01:49

c21 approved these changes Jan 17, 2023

View reviewed changes

ericl merged commit d66d722 into ray-project:master Jan 17, 2023

		return selected


		def dispatch_next_task(op_state: OpState) -> None:

Conversation

ericl commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Uh oh!

stephanie-wang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stephanie-wang commented Jan 12, 2023

Uh oh!

Uh oh!

Uh oh!

ericl commented Jan 12, 2023

Uh oh!

stephanie-wang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clarkzinzow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jianoaix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericl commented Jan 11, 2023 •

edited

Loading

ericl left a comment •

edited

Loading