WorkerState unit tests for resumed state#6688

Merged

crusaderky merged 12 commits intodask:mainfrom

crusaderky:resumed_state

Jul 8, 2022

Collaborator

crusaderky commented Jul 7, 2022 •

edited

Loading

Partially closes #6689

crusaderky self-assigned this

Contributor

github-actions bot commented Jul 7, 2022 •

edited

Loading

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±  0       15 suites ±0 6h 26m 2s ⏱️ - 5m 57s
  2 938 tests +  7   2 846 ✔️ +  1     86 💤 +  1 4 ❌ +3 2 🔥 +2
21 774 runs +63 20 748 ✔️ +29 1 018 💤 +27 6 ❌ +5 2 🔥 +2

For more details on these failures and errors, see this check.

Results for commit 33d846e. ± Comparison against base commit ade4266.

♻️ This comment has been updated with latest results.

crusaderky force-pushed the resumed_state branch from 1fe84cf to f973677 Compare

July 7, 2022 22:14


          WorkerState unit tests for resumed state

52158ec

crusaderky force-pushed the resumed_state branch from f973677 to 52158ec Compare

July 7, 2022 22:20


          tweak validate_state

6bf77ac

hendrikmakait reviewed

View reviewed changes

distributed/tests/test_cancelled_state.py Outdated Show resolved Hide resolved

crusaderky added 5 commits

July 8, 2022 09:58


          tweak doc

0a3fec2


          tweak doc

8a39d13


          Merge branch 'main' into resumed_state

0f083a9


          more compact

8e9cf80


          xfail

33d846e

crusaderky marked this pull request as ready for review

July 8, 2022 09:21

Collaborator Author

crusaderky commented Jul 8, 2022 •

edited

Loading

This is ready for merge; remediation of the issue will follow in a separate PR.
@hendrikmakait please review.

hendrikmakait reviewed

View reviewed changes

distributed/tests/test_cancelled_state.py

-                  - executing -> cancelled -> resumed -> executing
-                  - executing -> long-running -> cancelled -> resumed -> long-running
+                  - executing -> cancelled -> executing
+                  - executing -> long-running -> cancelled -> long-running

Member

hendrikmakait Jul 8, 2022

When do we actually send the SecedeEvent?

Collaborator Author

crusaderky Jul 8, 2022

in ws_with_running_task

Member

hendrikmakait Jul 8, 2022

Never mind, the question was directed toward when we "decide" that the second version of the task should also secede(), but I think this makes sense to me after another look at the codebase.

distributed/tests/test_cancelled_state.py

+                      ComputeTaskEvent.dummy("z", who_has={"x": [ws2]}, stimulus_id="s3"),
+                  )
+                  assert instructions == [
+                      GatherDep(worker=ws2, to_gather={"x"}, total_nbytes=1, stimulus_id="s1")

Member

hendrikmakait Jul 8, 2022

Does it make sense to fire off a second GatherDep for x if the previous one is already ongoing?

Collaborator Author

crusaderky Jul 8, 2022

There is no second GatherDep in the instructions.
In flight tasks can be in flight from a single worker at a time.

Member

hendrikmakait Jul 8, 2022

I overlooked that we're firing all instructions in a single call to ws.handle_stimulus. I thought that this was just the last ComputeTaskEvent creating the instruction.

distributed/tests/test_cancelled_state.py Outdated Show resolved Hide resolved

distributed/tests/test_cancelled_state.py Outdated Show resolved Hide resolved

distributed/tests/test_cancelled_state.py Outdated Show resolved Hide resolved

distributed/tests/test_cancelled_state.py Outdated Show resolved Hide resolved

distributed/tests/test_cancelled_state.py Outdated Show resolved Hide resolved

distributed/tests/test_cancelled_state.py

+                      FreeKeysEvent(keys=["y", "x"], stimulus_id="s2"),
+                      ComputeTaskEvent.dummy("x", stimulus_id="s3"),
+                      # Peer worker does not have the data
+                      GatherDepSuccessEvent(worker=ws2, total_nbytes=1, data={}, stimulus_id="s4"),

Member

hendrikmakait Jul 8, 2022

Shouldn't this be a GatherDepFailureEvent?

Collaborator Author

crusaderky Jul 8, 2022

A task can get out of flight in one of the following ways:

GatherDepSuccessEvent, and the task is in data
GatherDepSuccessEvent, but the task is not in data (the remote worker says it doesn't have it)
GatherDepNetworkFailureEvent (the remote worker is dead)
GatherDepBusyEvent (the remote worker is busy)
GatherDepFailureEvent (edge case, generic error e.g. failure to deserialize)

All but the first are a failure to gather the key. That's what I was explaining in the commend on the previous line.

Member

hendrikmakait Jul 8, 2022

From the docstring gather_dep later terminates with a failure I was under the impression that we were testing an actual GatherDepFailureEvent. Is there a way we can make this clearer? gather_dep terminates successfully but does not have the data, resulting in a failure might be an alternative to stress the point and the rather intricate difference between the successful event and the failing outcome.

Alternatively, I'd prefer just using anything but GatherDepSuccessEvent to avoid further confusion.

crusaderky and others added 5 commits

July 8, 2022 13:26


          Update distributed/tests/test_cancelled_state.py

0f42452

Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>


          Update distributed/tests/test_cancelled_state.py

f97952b

Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>


          Update distributed/tests/test_cancelled_state.py

15f4803

Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>


          Update distributed/tests/test_cancelled_state.py

96fa759

Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>


          Update distributed/tests/test_cancelled_state.py

94414c8

Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>

hendrikmakait approved these changes

View reviewed changes

Member

hendrikmakait left a comment

LGTM, feel free to ignore my comment about a clearer docstring for test_workerstate_flight_failure_to_executing

crusaderky merged commit 887b442 into dask:main

crusaderky deleted the resumed_state branch

July 8, 2022 13:09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet