Allow actor exceptions to propagate by martindurant · Pull Request #4232 · dask/distributed

martindurant · 2020-11-10T14:19:53Z

Follows on from #4225

martindurant · 2020-11-10T14:28:50Z

This seems to take more code than I would have thought, to wrap and unwrap the results dict in various places.

martindurant · 2020-11-10T18:14:16Z

distributed/worker.py

            executing={
                key: now - self.tasks[key].start_time
                for key in self.active_threads.values()
+                if key in self.tasks


This line is a little worrysome. Without it, you get KeyError here during heartbeat after a failed actor action, but I don't really know why. Should the "active thread" be cleaned up somewhere?

martindurant · 2020-11-11T14:28:35Z

CI fails on apparently unrelated timeout in test_broken_worker_during_computation on py36 only

martindurant · 2021-06-07T15:57:38Z

If this still passes, I would really like to see it merged! I may offer a "please object" timeframe.

jakirkham · 2021-06-07T18:55:30Z

@jrbourbeau would you be able to take a look or do you know who would be able to?

martindurant · 2021-06-08T12:58:10Z

OK, all passed except some unrelated and presumably flaky steal test in one run.
(Note: all the post warnings are still in the very long build log)

jrbourbeau

Thanks @martindurant!

distributed/actor.py

jrbourbeau · 2021-06-11T00:59:44Z

distributed/tests/test_actor.py

+        def prop(self):
+            raise MyException
+
+    with cluster(nworkers=2) as (cl, w):


Is there a reason to use cluster here instead of @gen_cluster like most of the other test in this module?

It was to test the sync API, which is more typical for actors. I added an async version immediately below (could remove this one, if you like).

It was to test the sync API, which is more typical for actors. I added an async version immediately below (could remove this one, if you like).

To be clear here, the sync api is more common for everything. We prefer the async tests because they are faster/easier on CI and allow for greater debuggability. Adding sync tests too is fine if we want to be extra careful, but in general we prefer async tests. In general if async tests I usually have confidence that sync works just as well, unless I'm explicitly writing code to handle synchronization.

What's here is great though.

What's here is great though.

I don't mind keeping it or not. Writing the sync version first probably reflects how I initially tested by hand. It was some time ago, so I can't remember if there was any other reason, given that the async version is effectively identical and works just fine.

jrbourbeau

Thanks for the updates @martindurant! Just a few small comments on test_actor.py, otherwise I think this is good to merge

jrbourbeau · 2021-06-11T21:53:29Z

distributed/tests/test_actor.py

+from distributed.utils_test import (  # noqa: F401
+    async_wait_for,
+    cluster,
+    cluster_fixture,
+    gen_cluster,
+    loop,
+)


Following #4888, all pytest fixtures in distributed.utils_test are globally available. So there's no need for the cluster_fixture or loop imports (and hopefully we can drop # noqa: F401 too).

distributed/tests/test_actor.py

martindurant · 2021-06-14T13:04:50Z

Do we have a tracking issue for ongoing CI failures? The two here are not actor related, as far as I can tell.

mrocklin · 2021-06-14T13:30:02Z

There are a variety of issues, generally one per failure

…

On Mon, Jun 14, 2021 at 8:05 AM Martin Durant ***@***.***> wrote: Do we have a tracking issue for ongoing CI failures? The two here are not actor related, as far as I can tell. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#4232 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTBDC2GSCM3V3PIUFRTTSX5AFANCNFSM4TQXARUA> .

distributed/tests/test_actor.py

jrbourbeau · 2021-06-14T15:50:44Z

There are a variety of issues, generally one per failure

Yeah, we keep track of these issues with a flaky label

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

jrbourbeau

Thanks @martindurant! Will merge once CI finishes

Martin Durant added 6 commits November 6, 2020 17:22

Allow actors to call actors on the same worker

c845a6c

Make tests

9081029

simplify

8c1628c

Extra code path only for actors

23269b8

oops, rename

96a2e02

Allow actor exception to propagate

a3d0cff

martindurant force-pushed the actor_fail branch from e2483ad to a3d0cff Compare November 10, 2020 16:20

Merge branch 'master' into actor_fail

6eadd9e

martindurant marked this pull request as ready for review November 10, 2020 18:11

martindurant commented Nov 10, 2020

View reviewed changes

martindurant requested a review from mrocklin November 11, 2020 17:26

martindurant mentioned this pull request Nov 30, 2020

Recreate actor instances upon worker faliure #4287

Draft

Merge branch 'master' into actor_fail

877b562

martindurant mentioned this pull request Dec 21, 2020

dask-streamz based on the actor interface python-streamz/streamz#369

Open

Base automatically changed from master to main March 8, 2021 19:04

jrbourbeau mentioned this pull request May 3, 2021

Actor w/ async method does not propagate exceptions, and hangs forever dask/dask#7626

Closed

Merge branch 'main' into actor_fail

0a079d1

jakirkham requested a review from jrbourbeau May 15, 2021 22:44

Martin Durant added 2 commits May 24, 2021 12:20

Merge branch 'main' into actor_fail

8242454

Merge branch 'main' into actor_fail

776e408

fix merge

a1a2620

jrbourbeau reviewed Jun 11, 2021

View reviewed changes

Martin Durant added 3 commits June 11, 2021 09:38

resolve comments

1e6e929

simplify test_actor::test_compute

95631d4

Merge branch 'main' into actor_fail

8c7d2ed

jrbourbeau reviewed Jun 11, 2021

View reviewed changes

Responses

16fd6d7

jrbourbeau reviewed Jun 14, 2021

View reviewed changes

distributed/tests/test_actor.py Outdated Show resolved Hide resolved

Update distributed/tests/test_actor.py

28b6460

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

jrbourbeau approved these changes Jun 14, 2021

View reviewed changes

jrbourbeau merged commit 05c5621 into dask:main Jun 14, 2021

martindurant deleted the actor_fail branch June 14, 2021 18:29

Uh oh!

Conversation

martindurant commented Nov 10, 2020 • edited by jrbourbeau Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindurant commented Nov 10, 2020

Uh oh!

martindurant Nov 10, 2020

Choose a reason for hiding this comment

Uh oh!

martindurant commented Nov 11, 2020

Uh oh!

martindurant commented Jun 7, 2021

Uh oh!

jakirkham commented Jun 7, 2021

Uh oh!

martindurant commented Jun 8, 2021

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jrbourbeau Jun 11, 2021

Choose a reason for hiding this comment

Uh oh!

martindurant Jun 11, 2021

Choose a reason for hiding this comment

Uh oh!

mrocklin Jun 11, 2021

Choose a reason for hiding this comment

Uh oh!

martindurant Jun 11, 2021

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Jun 11, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

martindurant commented Jun 14, 2021

Uh oh!

mrocklin commented Jun 14, 2021 via email

Uh oh!

Uh oh!

jrbourbeau commented Jun 14, 2021

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

martindurant commented Nov 10, 2020 •

edited by jrbourbeau

Loading