-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
Description
In dask.array we allow user provided functions to get a special keyword, block_info with information about the array and where the block that they're operating in that array lives:
Lines 656 to 674 in 5b2ecf4
| Your block function get information about where it is in the array by | |
| accepting a special ``block_info`` keyword argument. | |
| >>> def func(block, block_info=None): | |
| ... pass | |
| This will receive the following information: | |
| >>> block_info # doctest: +SKIP | |
| {0: {'shape': (1000,), | |
| 'num-chunks': (10,), | |
| 'chunk-location': (4,), | |
| 'array-location': [(400, 500)]}} | |
| For each argument and keyword arguments that are dask arrays (the positions | |
| of which are the first index), you will receive the shape of the full | |
| array, the number of chunks of the full array in each dimension, the chunk | |
| location (for example the fourth chunk over in the first dimension), and | |
| the array location (for example the slice corresponding to ``40:50``). |
We might consider doing the same for dask.dataframe.map_partitions. We have a bit less information to provide, but can probably include:
- The number of partitions in the dataframe
- Which partition we're on specifically
This requires using the dask.base.has_keyword function to inspect the user provided function, and then manipulating the task graph a bit to add the extra keyword argument.
Inspired by this stack overflow question: https://stackoverflow.com/questions/51135739/how-to-modify-values-on-specific-dask-dataframe-partiton
Reactions are currently unavailable