[Bug Fix] Call custom optimizations once, with kwargs provided.#6382
Conversation
|
@clarkzinzow thanks for spending the time here and tracking this down. Looks like the second |
|
@quasiben I don't think this is worth a test; double-calls of idempotent functions aren't something that devs usually test for. Such a bug is usually very obvious from the code and doesn't get past review, and if it's not obvious, that's a sign that the code is convoluted and hard to understand. E.g. if this function was refactored into something like def collections_to_dsk(collections, optimize_graph=True, **kwargs):
"""
Convert many collections into a single dask graph, after optimization
"""
optimizations = kwargs.pop("optimizations", None) or config.get("optimizations", [])
if optimize_graph:
groups = groupby(optimization_function, collections)
opt_dsk_list = []
for col_opt, val in groups.items():
dsk, keys = _extract_graph_and_keys(val)
opt_dsk = col_opt(dsk, keys, **kwargs)
for opt in optimizations:
opt_dsk = opt(opt_dsk, keys, **kwargs)
opt_dsk_list.append(opt_dsk)
dsk = merge(*map(ensure_dict, opt_dsk_list))
else:
dsk, _ = _extract_graph_and_keys(collections)
return dskyou'd have what is most likely a better performing optimization pass that's easier to understand. |
|
Ok. Thanks! Merging in |
|
This PR appears to have broken tests in dask/distributed. |
|
Thanks for the ping @mrocklin i'll take a look first thing in the morning |
…#6382) * Call custom optimizations once, with kwargs provided. * Actually retain collection-specific optimizations when custom optimizations are specified.
Fixes #6381.
Also noticed that collection-specific optimizations aren't being retained when custom optimizations are specified, which is fixed in the second commit.