Check exact equality for worker state#3483
Conversation
|
cc @StephanErb |
|
Thanks for picking this up @bnaul This looks good to me. I'll merge after either a +1 from @StephanErb or a couple days. |
|
My guess is that all of the test failures are intermittent. When we changed to spawn by default it unearthed a bunch of things. Squashing down a couple now. |
|
Well, this one is interesting @gen_cluster(timeout=1000, client=True)
def test_recompute_released_key(c, s, a, b):
x = c.submit(inc, 100)
result1 = yield x
xkey = x.key
del x
import gc
gc.collect()
yield gen.moment
> assert c.refcount[xkey] == 0
E assert 1 == 0
E -1
E +0 |
StephanErb
left a comment
There was a problem hiding this comment.
Thanks for doing the follow up. As you have requested my review I have looked into the code a bit deeper. Please see my questions below.
distributed/scheduler.py
Outdated
| return type(self) == type(other) and (self.name, self.host) == ( | ||
| other.name, | ||
| other.host, | ||
| ) |
There was a problem hiding this comment.
I am wondering if using name and host is correct in all cases:
nameis optional and could be Nonehostis not necessary unique as multiple workers could be running on the same machine
I am lacking necessary Dask context, so I am wondering why you have not picked address instead? It is used for indexing the WorkerState and the documentation says it is [t]his worker's unique key..
So I would assume, this here should work as expected:
def __hash__(self):
return hash(self.address)
def __eq__(self, other):
return type(self) == type(other) and self.address == other.address There was a problem hiding this comment.
I was just pulling items from the identity() dictionary, no personal reason to prefer one to the other. @TomAugspurger @mrocklin would you agree address makes more sense here?
|
Yes, address is probably the best thing to do in both cases.
…On Sun, Feb 16, 2020 at 1:23 PM Brett Naul ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In distributed/scheduler.py
<#3483 (comment)>:
> + return type(self) == type(other) and (self.name, self.host) == (
+ other.name,
+ other.host,
+ )
I was just pulling items from the identity() function, no personal reason
to prefer one to the other. @TomAugspurger
<https://github.com/TomAugspurger> @mrocklin <https://github.com/mrocklin>
would you agree address makes more sense here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3483?email_source=notifications&email_token=AACKZTHHWYYG6RFFP5E66PLRDGVFTA5CNFSM4KV3HXIKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCVWHPWY#discussion_r379934436>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTAXOVLPZTZ4VHWKZBLRDGVFTANCNFSM4KV3HXIA>
.
|
4724d85 to
3cf2bab
Compare
|
thanks @StephanErb @mrocklin , updated w/ your suggestions |
Per #3321 (comment), I agree it makes more sense to check for exact equality instead of relying on the hash (just in case 🙂).
I could see an argument for wrapping this tuple in a property so it can be reused, do we think that's preferable? I'd originally wanted to use
identityforhashandeqbut that has a bit too much going on IMO.