-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
@jcrist did some work back in #755 on swapping getitem and elementwise operations. We didn't end up merging this work, because we were unable to come up with a solution that worked across the board without slowing down every dask array operations (by forcing broadcasting).
However, this is still a vital optimization for climate/weather data analysis use cases (where arrays are often currently loaded with one chunk/file, as described in pydata/xarray#1440), and would be extremely valuable for xarray (allowing us to remove our legacy system for deferred array computation). It would be great to pursue it in some more limited form.
Any of the following would solve xarray's use cases:
- Optimization that only works on elementwise operations with a single array argument, e.g., of the form
x + 1. - Optimization that only works on elementwise operations with pre-broadcast arguments. These would need some sort of special indication in the dask graph, likely via some custom API, e.g.,
da.swappable_elementwise(). - A new (optional) elementwise operation that always explicitly broadcasts, for cases when optimization is essential and some overhead is acceptable.
The optimization pass itself could also be optional, but we would always want to use it on dask.array objects wrapped with xarray.