Log state machine events by crusaderky · Pull Request #6092 · dask/distributed

crusaderky · 2022-04-08T12:58:00Z

Partially closes Migrate ensure_computing transitions to new WorkerState event mechanism #5895

github-actions · 2022-04-08T15:19:51Z

Unit Test Results

      16 files ±  0       16 suites ±0 7h 12m 23s ⏱️ - 26m 48s
  2 734 tests +  4   2 653 ✔️ +  5     80 💤 - 1 0 ❌ - 1 1 🔥 +1
21 758 runs +33 20 685 ✔️ +40 1 072 💤 - 7 0 ❌ - 1 1 🔥 +1

For more details on these errors, see this check.

Results for commit 34274b7. ± Comparison against base commit bd3f47e.

♻️ This comment has been updated with latest results.

mrocklin · 2022-04-11T16:34:21Z

@sjperkins can I ask you to review this?

sjperkins

This PR adds logging of state machine events to the Worker. Modified StateMachineEvents are added to a new Worker.stimulus_log attribute. StateMachineEvents can be converted to dictionaries and partly reconstructed from them. This is to support replay from logs discussed for e.g. here #5736 (comment).

I think StateMachineEvent.log could be renamed to something more descriptive

sjperkins · 2022-04-12T10:32:13Z

distributed/worker_state_machine.py

+    def __init_subclass__(cls):
+        StateMachineEvent._classes[cls.__name__] = cls
+
+    def log(self, *, handled: float) -> StateMachineEvent:


I think this could be named something more descriptive. How about one of the following?

logabble_event?

to_loggable_event?

sjperkins · 2022-04-12T10:32:30Z

distributed/worker_state_machine.py

    stimulus_id: str
+    #: timestamp of when the event was handled by the worker
+    # TODO switch to @dataclass(slots=True) and uncomment (requires Python >=3.10)
+    # handled: float | None = field(init=False, default=None)


I guess this is the reason for the new method containing the self.handled = None assignment

yes. clarified in comment

sjperkins · 2022-04-12T10:34:44Z

distributed/worker_state_machine.py

+        self.handled = handled
+        return self
+
+    def _to_dict(self, *, exclude: Container[str] = ()) -> dict:


This dictionary conversion seems necessary because stimulus_log: StateMachineEvent has been added to Worker and thus must be supported by Worker._to_dict

sjperkins · 2022-04-12T10:42:59Z

distributed/tests/test_worker.py

+
+    prev_handled = story[0].handled
+    for ev in story[1:]:
+        assert ev.handled > prev_handled


Can we always assume that this invariant holds?

There is a very tiny chance of getting two events in the same nanosecond. changed to >=.

sjperkins · 2022-04-12T10:52:41Z

distributed/worker_state_machine.py

+    def log(self, *, handled: float) -> StateMachineEvent:
+        out = copy(self)
+        out.handled = handled
+        out.value = None


I understand the execution result is discarded because of the potentially large size of the result, and possibly the complexity of serialising/deserialising the result?

Not discarding it would cause worker.stimulus_log to become effecitvely a copy of worker.data, except that it never loses any data!

sjperkins · 2022-04-12T10:56:38Z

distributed/worker_state_machine.py

+
+    def _after_from_dict(self) -> None:
+        self.value = None
+        self.type = None


I guess the execution result type is discarded here because it's merely a string representation at this point and one would have to deal with serialising/unserialising types.

In any case, I think reconstructing the result of execution is non-trivial. How does this impact replayability of events on the Worker (out of interest?)

these fields that are being discarded on a serialization round-trip should be inconsequential for the purpose of rebuilding the state.

crusaderky · 2022-04-12T23:08:55Z

All review comments have been addressed

mrocklin · 2022-04-13T12:45:49Z

Thank you for the work @crusaderky and for the review @sjperkins

Log state machine events

0ac6c5d

crusaderky requested a review from fjetter April 8, 2022 12:58

Add to cluster dump

1ccecf7

crusaderky self-assigned this Apr 8, 2022

crusaderky mentioned this pull request Apr 8, 2022

Migrate ensure_computing transitions to new WorkerState event mechanism #5895

Closed

3 tasks

Merge branch 'main' into WMSM/log_events

55f0754

sjperkins self-requested a review April 12, 2022 10:35

sjperkins suggested changes Apr 12, 2022

View reviewed changes

crusaderky added 2 commits April 13, 2022 00:00

Merge branch 'main' into WMSM/log_events

f1d9bb6

Code review

34274b7

fjetter mentioned this pull request Apr 13, 2022

Support Stimulus ID's via argument passing #6095

Closed

4 tasks

sjperkins approved these changes Apr 13, 2022

View reviewed changes

mrocklin merged commit 6a3cbd3 into dask:main Apr 13, 2022

crusaderky deleted the WMSM/log_events branch April 13, 2022 12:48

Uh oh!

Conversation

crusaderky commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

mrocklin commented Apr 11, 2022

Uh oh!

sjperkins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crusaderky commented Apr 12, 2022

Uh oh!

mrocklin commented Apr 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

crusaderky commented Apr 8, 2022 •

edited

Loading

github-actions bot commented Apr 8, 2022 •

edited

Loading