Skip to content

[Bug Fix] Call custom optimizations once, with kwargs provided.#6382

Merged
quasiben merged 2 commits intodask:masterfrom
clarkzinzow:fix/custom-optimizations-called-twice
Jul 9, 2020
Merged

[Bug Fix] Call custom optimizations once, with kwargs provided.#6382
quasiben merged 2 commits intodask:masterfrom
clarkzinzow:fix/custom-optimizations-called-twice

Conversation

@clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Jul 9, 2020

Fixes #6381.

Also noticed that collection-specific optimizations aren't being retained when custom optimizations are specified, which is fixed in the second commit.

@quasiben
Copy link
Member

quasiben commented Jul 9, 2020

@clarkzinzow thanks for spending the time here and tracking this down. Looks like the second opt call bit us -- apologies for that. Do you think this is worth adding a test for to ensure we any future changes don't inadvertently result in multiple optimize calls ?

@clarkzinzow
Copy link
Contributor Author

@quasiben I don't think this is worth a test; double-calls of idempotent functions aren't something that devs usually test for. Such a bug is usually very obvious from the code and doesn't get past review, and if it's not obvious, that's a sign that the code is convoluted and hard to understand. E.g. if this function was refactored into something like

def collections_to_dsk(collections, optimize_graph=True, **kwargs):
    """
    Convert many collections into a single dask graph, after optimization
    """
    optimizations = kwargs.pop("optimizations", None) or config.get("optimizations", [])

    if optimize_graph:
        groups = groupby(optimization_function, collections)

        opt_dsk_list = []
        for col_opt, val in groups.items():
            dsk, keys = _extract_graph_and_keys(val)
            opt_dsk = col_opt(dsk, keys, **kwargs)
            for opt in optimizations:
               opt_dsk = opt(opt_dsk, keys, **kwargs)
            opt_dsk_list.append(opt_dsk)

        dsk = merge(*map(ensure_dict, opt_dsk_list))
    else:
        dsk, _ = _extract_graph_and_keys(collections)

    return dsk

you'd have what is most likely a better performing optimization pass that's easier to understand.

@quasiben
Copy link
Member

quasiben commented Jul 9, 2020

Ok. Thanks!

Merging in

@quasiben quasiben merged commit 4a11e06 into dask:master Jul 9, 2020
@clarkzinzow clarkzinzow deleted the fix/custom-optimizations-called-twice branch July 9, 2020 17:21
@mrocklin
Copy link
Member

This PR appears to have broken tests in dask/distributed.

@mrocklin
Copy link
Member

See dask/distributed#3958

@quasiben
Copy link
Member

Thanks for the ping @mrocklin i'll take a look first thing in the morning

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
…#6382)

* Call custom optimizations once, with kwargs provided.

* Actually retain collection-specific optimizations when custom optimizations are specified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom optimization functions are called twice, once without scheduler kwargs.

3 participants