Skip to content

Commit feb40fd

Browse files
authored
Clarify that bind() etc. regenerate the keys (#9385)
1 parent 0a7b58b commit feb40fd

1 file changed

Lines changed: 19 additions & 2 deletions

File tree

dask/graph_manipulation.py

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -271,10 +271,12 @@ def bind(
271271
-------
272272
Same as ``children``
273273
Dask collection or structure of dask collection equivalent to ``children``,
274-
which compute to the same values. All keys of ``children`` will be regenerated,
275-
up to and excluding the keys of ``omit``. Nodes immediately above ``omit``, or
274+
which compute to the same values. All nodes of ``children`` will be regenerated,
275+
up to and excluding the nodes of ``omit``. Nodes immediately above ``omit``, or
276276
the leaf nodes if the collections in ``omit`` are not found, are prevented from
277277
computing until all collections in ``parents`` have been fully computed.
278+
The keys of the regenerated nodes will be different from the original ones, so
279+
that they can be used within the same graph.
278280
"""
279281
if seed is None:
280282
seed = uuid.uuid4().bytes
@@ -429,6 +431,17 @@ def clone(*collections, omit=None, seed: Hashable = None, assume_layers: bool =
429431
('add-5', 0): (<function operator.add>, ('add-4', 0), 1),
430432
('add-5', 1): (<function operator.add>, ('add-4', 1), 1)}
431433
434+
The typical usage pattern for clone() is the following:
435+
436+
>>> x = cheap_computation_with_large_output() # doctest: +SKIP
437+
>>> y = expensive_and_long_computation(x) # doctest: +SKIP
438+
>>> z = wrap_up(clone(x), y) # doctest: +SKIP
439+
440+
In the above code, the chunks of x will be forgotten as soon as they are consumed by
441+
the chunks of y, and then they'll be regenerated from scratch at the very end of the
442+
computation. Without clone(), x would only be computed once and then kept in memory
443+
throughout the whole computation of y, needlessly consuming memory.
444+
432445
Parameters
433446
----------
434447
collections
@@ -446,6 +459,8 @@ def clone(*collections, omit=None, seed: Hashable = None, assume_layers: bool =
446459
Dask collections of the same type as the inputs, which compute to the same
447460
value, or nested structures equivalent to the inputs, where the original
448461
collections have been replaced.
462+
The keys of the regenerated nodes in the new collections will be different from
463+
the original ones, so that they can be used within the same graph.
449464
"""
450465
out = bind(
451466
collections, parents=None, omit=omit, seed=seed, assume_layers=assume_layers
@@ -489,6 +504,8 @@ def wait_on(
489504
Dask collection of the same type as the input, which computes to the same value,
490505
or a nested structure equivalent to the input where the original collections
491506
have been replaced.
507+
The keys of the regenerated nodes of the new collections will be different from
508+
the original ones, so that they can be used within the same graph.
492509
"""
493510
blocker = checkpoint(*collections, split_every=split_every)
494511

0 commit comments

Comments
 (0)