Skip to content

normalize_token does not raise warning for function with non-deterministic hash #9135

@LunarLanding

Description

@LunarLanding
import dask.config
from dask.base import tokenize
a,b = (
    lambda a: a,
    lambda a: a,
)
with dask.config.set({"tokenize.ensure-deterministic":True}):
    print(tokenize(a)==tokenize(b))
str(a),str(b)

Gives:

False
('<function <lambda> at 0x14a0d079fee0>',
 '<function <lambda> at 0x14a0ca90cc10>')

This is because of the last lines here, where the code gives up by using str(func):

dask/dask/base.py

Lines 1043 to 1069 in c60b1f7

def _normalize_function(func: Callable) -> tuple | str | bytes:
if isinstance(func, Compose):
first = getattr(func, "first", None)
funcs = reversed((first,) + func.funcs) if first else func.funcs
return tuple(normalize_function(f) for f in funcs)
elif isinstance(func, (partial, curry)):
args = tuple(normalize_token(i) for i in func.args)
if func.keywords:
kws = tuple(
(k, normalize_token(v)) for k, v in sorted(func.keywords.items())
)
else:
kws = None
return (normalize_function(func.func), args, kws)
else:
try:
result = pickle.dumps(func, protocol=4)
if b"__main__" not in result: # abort on dynamic functions
return result
except Exception:
pass
try:
import cloudpickle
return cloudpickle.dumps(func, protocol=4)
except Exception:
return str(func)

Metadata

Metadata

Assignees

Labels

bugSomething is brokencore

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions