Add config for handling obj normalization with missing __dask_tokenize__#7413
Add config for handling obj normalization with missing __dask_tokenize__#7413
Conversation
dask/tests/test_base.py
Outdated
|
|
||
| from tlz import merge, partial, compose, curry | ||
|
|
||
| from unittest import mock |
There was a problem hiding this comment.
Not entirely sure what convention to employ here in terms of whether to import mock at this level or within test_normalize_object itself (just as it's been done for import warnings; the latter has been imported within the test only, because its general utility is not as wide, in this particular context).
Any feedback would be greatly appreciated.
|
Need to go through the rest of the codebase a bit more to get an idea how it'd be best to go about documenting the newly-proposed config. |
|
I'm considering documenting the config as part of this section, because I haven't been able to find a more suitable one. |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for the PR @hristog!
We can probably scale the changes back here a bit to just support a boolean flag for whether or not we should raise an error. I suspect a relatively small group of users will utilize this setting and when things are non-deterministic they'll want an error to be raised. We can always expand thing in the future if needed.
Something like (I've not tested the snippet below)
if callable(o):
return normalize_function(o)
elif dask.config.get("tokenize.allow-random", True):
return uuid.uuid4().hex
else:
raise RuntimeError("...")should work for that case
dask/tests/test_base.py
Outdated
| assert normalize_token(i) is i | ||
|
|
||
|
|
||
| def test_normalize_object(): |
There was a problem hiding this comment.
What do you think about something like (again I've not tested the snippet below):
with dask.config.set({"tokenize.allow-random": False}):
with pytest.raises(RuntimeError, match="..."):
tokenize(object())|
Thanks for looking into this PR, @jrbourbeau! I've scaled down the functionality, associated with the newly-proposed config, as suggested, and updated its value to One tiny concern I've got about the naming ( I suppose this could always be added, given actual demand, if GitHub issues are submitted requesting those to be supported at any point later on. When you get a chance, please, let me know what your thoughts are, regarding the best possible way to document the newly-introduced config. |
|
Hi @jrbourbeau - I'm writing to check if you've had a chance to look into the further changes that I pushed in 41b135b, and whether you'd like any additional refinements to be done on top of that. |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @hristog -- apologies for the delayed response.
One tiny concern I've got about the naming (allow-random) is that - as mentioned in this comment of mine - there are other sources of non-determinism that aren't going to be impacted by the proposed logic.
This is a great point -- if you want to add a check for tokenize.allow-random in those locations too, that would be welcome
dask/tests/test_base.py
Outdated
| assert normalize_token(i) is i | ||
|
|
||
|
|
||
| def test_normalize_object(): |
There was a problem hiding this comment.
This test can be simplified a bit. In particular, we can probably just test that an informative error is raised when tokenize.allow-random is False and we attempt to tokenize an object which can't be deterministically hashed. See this comment https://github.com/dask/dask/pull/7413/files#r596499598 for an outline of a simplified test.
There was a problem hiding this comment.
Hmm.. my concerns here are that, without tests proving that true negatives are handled as such, we may not have sufficient confidence (purely, from viewing unit tests as documentation of expected functionality) that the newly introduced functionality doesn't break behavior which it shouldn't affect. And, somehow, I don't feel comfortable to rely on other tests, external to this one, to do this for us.
Of course, if you strongly believe we could make do without the extra set of tests, and just have one which makes assertions with respect to one aspect only, then I'll have to remove them (no issues about that). Just wanted to share my concerns.
There was a problem hiding this comment.
This is a great point -- if you want to add a check for
tokenize.allow-randomin those locations too, that would be welcome
Hi @jrbourbeau - yes, I've opted to implement configurable behavior for those as well. The commits have already been added to this PR.
There was a problem hiding this comment.
This test can be simplified a bit.
I've made an attempt at simplifying the tests. Hopefully, looking better from your point of view now. Let me know, please, if you'd like anything else changed.
|
Hi @jrbourbeau, I'm writing to check if you've had a chance to have a look at the most recent updates. Please, let me know if the tests should be further stripped down, despite the discussed concerns. |
|
@jrbourbeau checking in here, it looks like @hristog updated the code according to your requests and CI is green. Do you think this is ready to go, or are there any other issues that should be addressed? |
jsignell
left a comment
There was a problem hiding this comment.
This is looking good, you just need a change to the config file to support this change.
- Rename config field to `tokenize.ensure-deterministic` - Add config field to `dask.yaml` and `dask-schema.yaml`
|
I've renamed the config field to |
Thanks, @jcrist! |
|
Thanks for the updates @jcrist! I pushed a small commit to avoid |
|
Test failure is unrelated - merging. Thanks @hristog! |
Thanks, @jrbourbeau! Thanks to your commit, I can much better understand what you meant in your previous comments on this PR. |
normalize_objectdoesn't find__dask_tokenize__#6555tokenize.allow-randomconfig, acceptingTrueorFalse.test_normalize_objecttodask/tests/test_base.py.black dask/flake8 dask